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HUMAN DNA SEQUENCES 
Background of the Invention 

Current methods for testing pharmacological substances rely 
5 on a three-stage testing approach to drug development. First-i 
candidate compounds are typically screened in some sort of in 
vitro system-i like inhibition of cancer cell growth- Candidates 
are then tested in an animal model-i as a first approximation of 
systemic effects-! including efficacy and toxicity. Compounds 

10 that still show promise after these initial in vivo screensi 
finally are tested in humans- Again-i human testing typically 
occurs in three phases: toxicity^ preliminary efficacyi and 
efficacy- The entire process can take more than a decade and 
cost hundreds of millions of dollars- Aside from the monetary^ 

15 costs and protracted time scale-, moreover current testing 

regimes waste the lives of countless laboratory animals and g\.. 
needlessly endanger the lives of human subjects- ..ad"-: 

A need exists-* therefore-i for more sophisticated drug 
screening techniques that can be done rapidly in vitro- These' 

20 screening techniques ideally will be reflective of systemic 

and/or organ-specific responses-* so that they provide a reliable 
indicator of action in a human body- Current techniques-* 
however-, tend to utilize only a single or limited number of 
markers-! thus answering only very simple questions that are of 

25 questionable medical import- For example^ a typical in vitro 
assay may ask whether a lead compound binds a particular 
receptor-i which has been implicated in a certain disorder. It is 
presumed that such binding is indicative of therapeutic 
usefulness-* but it does not even purport to address systemic 

30 effects. 

Not only are screening techniques for efficacy inadequate-! 
the available toxicity screens likewise are inadequate- 
Toxicity-i on a first level-! is usually measured by animal 
testing. Aside from the complications related to in vivo versus 
35 in vitro testing-! such screens are insufficient because of 
differences in metabolism-* uptake-, etc--! relative to humans- 
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Thus-i improved methods would be not only be in vi tro-based-i they 
would also be more "human-" 

With the increasing miniaturization of screening assays and 
the growing availability of targets for pharmaceutical 
5 intervention-! there is increasing interest in developing arrays 
containing large numbers of these targets that can be assayed 
simultaneously- If such an array contains a large enough 
population of targetsi it can be used to essentially mimic the 
systemic response- In other words-i the array becomes an in vitro 

10 surrogate for the human body- The more refined the array-, the 
more accurate the predictive capability- In theory-i an array 
could be constructed that can detect all of the known human 
expression products simultaneously n thereby-i providing a very 
reliable indicator of the human response to a given compound- 

15 These arrays offer advantages over the present in vitro screening 
systems in that they can assay large numbers of responses 
simultaneously- They are superior to animal testing because they 
are more "human 11 and-i thusi more predictive of human responses - ,v< 
In order to construct such arrays-% however-i the field is in 

20 need of further human targets- Advantageously t such targets will 
be provided with additional physiologically relevant information! 
such as whether the target is expressed in a particular tissue 
and whether it is related to a known functional class of targets- 
In this way-! the artisan can focus as neededi for example-i on 

25 tissue-specific effects or target class-specific effects-i thereby 
providing information useful in evaluating efficacy and/or 
toxicity • 

In addition to a need for pharmacological screening targets-i 
there is a need for further pharmacological substances- These 
30 substances can be used in the formulation of medicinal 

compositions and in treating a wide variety of disorders. 

The present invention responds to the aforementioned and 
other needs in the field by providing a population of novel 
targets useful-! inter alia^ in the profiling and medicinal 
35 contexts described above- 

Summary of the Invention 
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It is an object of the invention! therefore! to provide a 
set of human cDNA clones* Further to this object the invention 
provides sequences of human cDNA clones that were isolated from 
libraries generated from different human tissues. 
5 It is another object of the invention to provide assemblages 

of targets useful in profiling matrices for screening 
pharmacological test compounds- According to this object! 
assemblages comprising different populations of human nucleic 
acidsi proteins and antibodies are provided. In different 
10 embodiments! cDNA library-specific assemblages and target-family- 
specific targets are provided- 

It is a further object of the invention to provide a 
database of human nucleotide and protein sequences- Further to 
this objecti novel human nucleotide and protein sequences are 
15 provided in electronic form- In one embodiment! one or more of 
these sequences is provided in a searchable database- 
It is still another object of the invention to provide 
biologically active target molecules useful in treating or 
detecting human disorders- Further to this object! the invention 
20 provides nucleic acid and protein molecules that have the 

capacity to affect disease etiology or symptoms or correlate with 
known disease states- Also further to this object! a database is 
provided which comprises the disclosed molecules in electronic 
form- 

25 Detailed Description 

The invention results from a need in the art for new human 
nucleic acids and proteins. This need arises in several contexts- 
First! there is a need to identify targets for therapeutic 
intervention- Second! there is a need to identify molecules that 

30 may be adversely affected in a therapeutic context! thereby 
resulting in toxicity- Knowledge of these molecules will aid in 
the design of new medicaments with enhanced efficacy and decreased 
toxicity. Finally! the need encompasses human nucleic acids and 
proteins that have medicinal applicability in their own right- 

35 In view of these needs! the present inventors set out to 

isolate and sequence human cDNAs from tissue-specific libraries- 
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In this way-i they represent subsets of molecules likely to be 
targets for therapeutic intervention or for avoiding toxicity. In 
addition! the inventors divided the molecules into various sub- 
categories! based on suspected functionality! structural 
similarity etci which are of interest from a pharmacological 
perspective - 

GENERAL DESCRIPTION OF THE INVENTIVE MOLECULES 

The present invention provides novel polynucleotide molecules 
thatn in some instances! have similarities with known molecules- 
The inventive DNAs were cloned from five different human cDNA 
libraries. In addition to these DNA molecules! the invention 
provides their protein translations and antibodies derived from 
them- The inventive DNA and protein sequences are show 
individually in the Description of the Sequences- The inventive 
nucleic acids also include the complements of the DNA sequences 
provided in the Description of the Sequences as well as their RNA 
counterparts. Methods of producing the molecules also are 
provided- Further! the invention provides methods for detecting 
all or part of the molecules and of detecting polynucleotides 
encoding all or part of the molecules- 

The inventive molecules derive from five cDNA libraries: 
human fetal brain; human fetal kidney! human melanoma! human 
testis; and human amygdala- For convenience! each sequence bears 
a designation that indicates from which library it is derived- In 
particular! these designations are: u hfpbr" for human fetal brain; 
u hfkd" for human fetal kidney; "hmel" for human melanoma; lt htes" 
for human testis; and "hamy 11 for human amygdala- The individual 
libraries were constructed and screened as described below in the 
examples- 

The protein and DNA molecules of the invention are variously 
described herein as "target" molecules or "inventive" molecules- 
The sequences and other information pertinent to the nucleic acid 
and protein molecules of the invention are shown below in the 
Description of the Sequences- 

Description of the Sequences 

Key to the Description of the Sequences 
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The descriptions below provide the coding sequences of the 

inventive cDNAsi as well as the protein sequences and other 
useful inf ormation-i as set out herein- 

5 Grouping 

The clones were assigned to the following sixteen functional 
and/or tissue-derived groups: 

I- Amygdala derived 
10 H - Cell Cycle 

3- Cell Structure and Motility 

4- Differentiation/Development 

5- Intracellular Transport and Trafficking 
b- Melanoma derived 

15 7- Metabolism 

A. Nucleic Acid Management 

Signal Transduction 
ID- Transmembrane Protein 

II- Transcription Factors 
20 IE- Brain derived 

13- Kidney derived 

14- Mammary Carcinoma derived 

15- Testes derived 
lb- Uterus derived 

25 

Description of Clone Files 

The individual clone files are structured in the same 
pattern. The Sections are separated by paragraphs- 

30 1. Clone Name 

The clone names are deciphered with reference to the 
following example: 

DKFZphf kd2_3kli wherein the code represents: 

• producer of library ("DKFZ") (for convenience-* this 
35 reference may be eliminated) 

• a "p n for "plasmid cDNA library" (for convenience-! this 
reference may be eliminated) 

• library name (e-g- hfbr = human fetal brains hfkd = 
human fetal kidney; hmel = human melanomas htes = human 

40 testis; hamy = human amygdala) 

• an underscore ("_ n ) to separate library information 
from plate information 

• plate number (e-g- "3") 

• plate coordinates (letter first; e-g- n kl5 n ) 



45 



2 • Group 
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3 . Introduction 

short review of the similarities! function of the protein and 
possible applications 

5 4 . Short Information 

specifications about the cDNA (who sequenced! completeness of the 
cDNA-i similarity! who sequenced! chromosomal localisation! length 
of cDNA! localisation of poly A tail and polyadenylation signal) 

10 5. cDNA-Sequence 

6. BLASTn Results 

search results of blasting the cDNA sequence against all public 
databases 

15 

7. Medline Entries 

information about genes/proteins similar to the novel cDNA (if 
available) 

20 8. Putative Encoded Protein Information 

specifications about the encoded protein (ORF: length and 
localisation of the reading frame) 

9 . Protein Sequence 

25 

10. BLASTp Results 

search results of blasting the protein sequence against all 
public databases 

30 11 . Pedant Information 

output of fully automated annotation: summarises peptide 
information! homologies! patterns as follows: 

[Length! 

35 - length of the protein = number of amino acid residues 

- molecular weight of the protein 

EpIJ 
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- isoelectric point 
CH0H0L3 

- shows protein with closest similarity to the cDNA- 
encoded protein 

EFUNCATJ 

- functional information according to a catalogue 
developed by Munich Information center for Protein Sequences 
(MIPS) 

EBLOCKSJ 

- Blocks are multiply aligned ungapped segments 
corresponding to the most highly conserved regions of 
proteins- The blocks for the Blocks Database are made 
automatically by looking for the most highly conserved 
regions in groups of proteins documented in the Prosite 
Database- The Prosite pattern for a protein group is not 
used in any way to make the Blocks Database and the pattern 
may or may not be contained in one of the blocks 
representing a group- These blocks are then calibrated 
against the SUISS-PROT database to obtain a measure of the 
chance distribution of matches- It is these calibrated 
blocks that make up the Blocks Database- The UUU versions of 
the Prosite and SUISS-PROT Databases that are used on this 
server are located at the ExPASy World Wide Ueb (UUU) 
Molecular Biology Server of the Geneva University Hospital 
and the University of Geneva- World Wide Ueb URL 
http://blocks-f here - org/blocks/about_blocks - html/ is the 
entry point to the database- 

- here Blocks segments found in the analysed protein 
sequences are displayed 

ESC0P3 

Nearly all proteins have structural similarities with 
other proteins andn in some of these cases-i share a common 
evolutionary origin- The scop database provides a detailed 
and comprehensive description of the structural and 
evolutionary relationships between all proteins whose 
structure is knowni including all entries in Brookhaven 
National Laboratory's Protein Data Bank (PDB) - It is 
available as a set of tightly linked hypertext documents 
which make the large database comprehensible and accessible- 
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In addition the hypertext pages offer a panoply of 
representations of proteinsi including links to PDB entries-! 
sequences-! references! images and interactive display 
systems. World Wide Ueb URL http://scop.mrc- 
5 lmb-cam-ac.uk/scop/ is the entry point to the database- 

Existing automatic sequence and structure comparison tools 
cannot identify all structural and evolutionary 
relationships between proteins- The scop classification of 
proteins has been constructed manually by visual inspection 

10 and comparison of structures-i but with the assistance of 

tools to make the task manageable and help provide 
generality- Proteins are classified to reflect both 
structural and evolutionary relatedness- Many levels exist 
in the hierarchy-! but the principal levels are familyn 

15 superfamily and fold- The exact position of boundaries 

between these levels are to some degree subjective- Scop 
evolutionary classification is generally conservative: where 
any doubt about relatedness existsn we made new divisions at 
the family and superfamily levels- ' 

20 - - here SCOPE segments found in the analysed protein 

sequences are displayed 
CEC3 

ENZYME is a repository of information relative to the 
nomenclature of enzymes- It is primarily based on the 

25 recommendations of the Nomenclature Committee of the 

International Union of Biochemistry and Molecular Biology 
(IUBMB) and it describes each type of characterized enzyme 
for which an EC (Enzyme Commission) number has been 
provided. World Wide Ueb URL http://www.expasy.ch/enzyme/ is 

30 the entry point to the database- 

- here EC-number and name of enzymes with similarity to 
the analysed protein sequences are displayed 

EPIRKliO 

- functional information according to the Protein 

35 Information Resource (PIR) database catalogue developed by 

Munich Information Center for Protein Sequences (MIPS) -i the 
National Biomedical Research Foundation (NBRF) and the 
International Protein Information Database in Japan (JIPID). 
ESUPFAM3 
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- information according to the Protein Information 
Resource (PIR) database catalogue of protein superf amilies 
developed by Munich Information Center for Protein Sequences 
(MIPS)-, the National Biomedical Research Foundation (NBRF) 
and the International Protein Information Database in Japan 
(JIPID) - 

(EPR0SITE3 

please refer to IE - PROSITE Motifs 
EPFAMU 

please refer to 13- PFAM Motifs 

[Kid] 

- overall Edimensional folding information 

- 3D indicates that the proteins is similar to a 
protein of which a 3 dimensional structure is known 

- overall structural information 

11 

The last PEDANT-block depicts information about the 
folding structure of the protein generated by PREDATOR. 
PREDATOR is a secondary structure prediction program- It 
takes as input a single protein sequence to be predicted and 
can optimally use a set of unaligned sequences as additional 
information to predict the query sequence- The mean 
prediction accuracy of PREDATOR, is bfi* for a single sequence 
and 75'/. for a set of related sequences- PREDATOR does not 
use multiple sequence alignment- Insteadi it relies on 
careful pairwise local alignments of the sequences in the 
set with the query sequence to be predicted- 

Uorld Uide Ueb URL http://www.embl- 
heidelberg.de/argos/predator/predator_info.html is the entry 
point to the database. 

- H = helixn E = extended or sheetn _ = coili T = 
transmembrane! B = beta 

- x indicates a low-complexity region with repeat-like 
structure which is omitted in all BLAST searches 



12. PROSITE Motifs 
PROSITE is a database of protein families and domains- It 
consists of biologically significant sitesn patterns and profiles 
that help to reliably identify to which known protein family (if 
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any) a new sequence belongs- Uorld Wide bleb URL 

http://www.expasy.ch/prosite/ is the entry point to the database- 
A description of the prosite consensus patterns is provided 
herein-i after the description of the individual sequences. 

13. PFAM Motifs 
PFAH (protein families) is a large collection of multiple 
sequence alignments and hidden Markov models covering many common 
protein domains- blorld Uide Ideb URL http://www.sanger.ac.uk/Pfam/ 
is the entry point to the database- 

In the charts below-i the groups of sequences are listedi and 
the description of the individual clones follows. 
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DKFZphamy2_10hl7 
5 



group: signal transduction 

10 DKFZphamy2_lDhl7 encodes a novel IAD amino acid protein which 
shows weak similarity to murine had- 

The novel protein contains a Zinc finger motif of the C3HC 1 ! type 
(RING finger). The RING-finger domain is involved in mediating 

15 protein-protein interactions. Proteins containing a RING-finger 
are: mammalian V(D)J recombination activating protein (RAGl)i 
mouse rpt-1-, human rfpi human 52 Kd Ro/SS-A protein and others- 
The family of RING finger proteins contains a number of 
oncogenes. For example PMLi a probable transcription factor-. 

20 BRCAli the mammalian cbl- and bmi-1 proto-oncogenes • 



25 



30 



35 



The new protein can find application in modulating protein- 
protein-interaction and in studying the expression profile of 
amygdala-specific genes. 



weak similarity to hacl (Hus musculus) 
Sequenced by LHU 
Locus: unknown 
Insert length: A35 bp 

Poly A stretch at pos« 7S1-, polyadenylation signal at pos- 721 



1 CACAGAGATC ATTGTCAACC AGGCCTGTGG GGGGGACATG CCTGCCTTGG 

51 AAGGGGCACC CCATACCCCG CCACTGCCAC GGCGGCCCCG TAAGGGAAGC 

101 TCGGAGCTGG GCTTTCCCC6 CGTGGCCCCA GAGGATGAGG TCATTGTGAA 

40 151 TCAGTACGTG ATTCGGCCTG GCCCCTCGGC CTCGGCGGCT TCTTCGGCGG 

2D1 CGGCAGGCGA GCCCCTGGAG TGCCCCACCT GTGGGCACTC CTACAATGTC 

251 ACCCAGCGGA GGCCCCGCGT GCTGTCCTGC CTGCACTCTG TGTGTGAGCA 

301 GTGCCTGCAG ATTCTCTACG AGTCCTGCCC CAAGTACAAG TTCATCTCCT 

351 GCCCCACCTG CCGCCGTGAG ACTGTGCTCT TCACCGACTA CGGCCTGGCC 

45 101 GCGCTGGCTG TCAACACGTC CATCCTGAGC CGCCTGCCGC CTGAGGCGCT 

151 GACGGCCCCA TCCGGGGGTC AGTGGGGGGC TGAGCCCGAG GGCAGCTGCT 

5D1 ACCAGACCTT CCGGCAGTAC TGTGGGGCCG CGTGCACCTG CCACGTGCGG 

551 AACCCACTGT CCGCCTGCTC CATCATGTAG TAGCGCCTGC CTGCCCGCCA 

t,Ql CTGCCCGCTG AGCCTCGCTC GCTGCTTCTT CAGGGACCCG GCCCTGCCCT 

50 L.51 GCCGCCCGCT GACCCTTCCT TCCCCACCAT GGCTTCCGGC CCCACCCCGA 

7D1 GTGGCATTGT CGCTGCAGCC AACTTTGCCA TTAAAACTCT TTGCCAAAGT 

751 TAAAAAAAAA AAAAAAAAAA AAAAAAAAAA AAAAAAAAAA AAAAAAAAAA 

flOl AAAAAAAAAA AAAAGAAAAA AAAAAAAAAA AAAAG 



55 



BLAST Results 
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Medline entries 

5 

No Medline entry 



10 

Peptide information for frame 2 



ORF from 3& bp to £77 bpi peptide length: ifiO 
15 Category: similarity to unknown protein 

Classification: Cellular transport and traffic 
Prosite motifs: PRENYLATION (177-1 AD) 
ZINC_FINGER_C3HC>» (fil-^O) 

20 

1 MPALEGAPHT PPLPRRPRKG SSELGFPRVA PEDEVIVNflY VIRPGPSASA 
51 ASSAAAGEPL ECPTCGHSYN VTflRRPRVLS CLHSVCElJCL (2ILYESCPKY 
101 KFISCPTCRR ETVLFTDYGL AALAVNTSIL SRLPPEALTA PSGGfllilGAEP 
151 EGSCY(2TFR<3 YCGAACTCHV RNPLSACSIM 

25 



BLASTP hits 

30 No BLASTP hits available 

Alert BLASTP hits for DKFZphamyE_10hl?-. frame 2 
No Alert BLASTP hits found 

Pedant information for DKFZphamyB_10hl7i frame 2 



Report for DKFZphamyE_10hl7 .2 

40 

ILENGTHJ IflO 

EMUJ n40D-27 

CpI3 7-15 

45 EH0M0LJ TRENBL : ACDD7727_7 gene: "FfiK7-7"n Arabidopsis 

thaliana chromosome 1 BAC FAK7 sequence! complete sequence. 3e-0b 

CBL0CKS3 BLDDfl3TC 

EBL0CKSJ PFOmtEA 

50 CBL0CKS3 PR0D7b3H 

CBL0CKS1 BLOOSlfi Zinc finger-, C3HC4 type, proteins 

CPROSITE]) PRENYLATION 1 

EPR0SITE3 ZINC_FINGER_C3HCH 1 

EPFAMJ Zinc finger-. C3HC4 type (RING finger) 

55 EKIiO Alpha_Beta 

IEKIiU L0lil_C0MPLEXITY 5- St, V. 
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SE<2 MPALEGAPHTPPLPRRPRKGSSELGFPRVAPEDEVIVNflYVIRPGPSASAASSAAAGEPL 

SEG xxxxxxxxxx • • • • 

PRD cccccccccccccccccccccccccccccccceeeeeeeeeeecccccchhhhhhhcccc 

5 SEfl ECPTCGHSYNVTtiRRPRVLSCLHSVCEiKLfllLYESCPKYKFISCPTCRRETVLFTDYGL 

SEG 

PRD cccccccccccccccceeeecchhhhhhhhhhhhhccccceeeecccccceeeeeccccc 

SEfl AALAVNTSILSRLPPEALTAPSGGflldGAEPEGSCYflTFRflYCGAACTCHVRNPLSACSIII 

10 SEG 

PRD cchhhhhhhhhcccccccccccccccccccccccchhhhhhhcceeeecccccceeeccc 



15 Prosite for DKFZphamyB_lDhl7.a 



20 



psooet* i??->iai prenylation PDocoDSbb 

PSODSlfl fll-^l ZINC_FINGER_C3HCM PDOCODMin 



Pfam for DKFZphamyE_10hl7 .2 



25 HMM_NAME Zinc finger-, C3HCM type (RING finger) 
HPIn 

*CPICFcTF<21]>yPlilPFdePmnlPCgHsFCypCIrrlj C 

CP C Y+ +P+ L C+HS C+ C+ ++ 

30 C 

<2uery bH CPTC GHSYNVTflRRPRVLSCLHSVCEtJCL- 

(3ILYESCPKYKFISC IDS 

HUH PmC* 
35 PC 

(Suery 10t PTC IDA 
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5 group: signal transduction 

MFZphamy2_lDp7 encodes a novel IblS amino acid protein with 
similarity to Na+/Ca2+ exchange proteins. 

10 The Transport of Ca2 + from the sarcoplasm into the sarcoplasmic 
reticulum is an essential process in the initiation of muscle 
relaxation • 

In additionn the novel protein contains a PROSITE multicopper 
oxidase signature. Multicopper oxidases are enzymes that possess 
15 three spectroscopically different copper centers- 

The new protein can find application in modulation of NA+/Ca2+- 
exchange and voltage-dependend processes- 

20 

similarity to Na+/Ca2+ exchange proteins 

ATG in frame 3 is first in clone. 

25 Sequenced by LHU 

Locus: unknown 

Insert length'- 523b bp 
30 Poly A stretch at pos. S21ti no polyadenylation signal found 



1 CGGACGCGTG GGCGGACGCG TGGGCCCTGT ATACCTGTGC CACTTTGTGC 

51 CTTAAGGAAC AAGCTTGCTC AGCGTTTTCA TTTTTCAGTG CTTCTGAGGG 

35 101 TCCCCAGTGT TTCTGGATGA CATCAT6GAT CAGCCCAGCT GTCAACAATT 

151 CAGACTTCTG GACCTACAGG AAAAACATGA CCAGGGTAGC ATCTCTTTTT 

201 AGTGGTCAGG CTGTGGCTGG GAGTGACTAT GAGCCTGTGA CAAGGCAATG 

251 GGCCATAATG CAGGAAGGTG ATGAATTCGC AAATCTCACA GTGTCTATTC 

301 TTCCTGATGA TTTCCCAGAG ATGGATGAGA GTTTTCTAAT TTCTCTCCTT 

40 351 GAAGTTCACC TCATGAACAT TTCAGCCAGT TTGAAAAATC AGCCAACCAT 

H01 AGGACAGCCA AATATTTCTA CAGTTGTCAT AGCACTAAAT GGTGAT6CCT 

H51 TTGGAGTGTT TGTGATCTAC AGTATTAGTC CCAATACTTC CGAAGATGGC 

501 TTATTTGTTG AAGTTCAGGA GCAGCCCCAA ACCTTGGTGG AGCTGATGAT 

551 ACACAGGACA GGGGGCAGCT TAGGTCAAGT GGCAGTCGAA TGGCGTGTTG 

45 bOl TTGGTGGAAC AGCTACTGAA GGTTTAGATT TTATAGGTGC TGGAGAGATT 

bSl CTGACCTTTG CTGAAGGTGA AACCAAAAAG ACAGTCATTT TAACCATCTT 

701 GGATGACTCT GAACCAGAGG ATGACGAAAG TATCATAGTT AGTTTGGTGT 

751 ACACTGAAGG TGGAAGTAGA ATTTTGCCAA GCTCCGACAC TGTTAGAGTG 

flOl AACATTTTGG CCAATGACAA TGTGGCAGGA ATTGTTAGCT TTCAGACAGC 

50 651 TTCCAGATCT GTCATAGGTC ATGAAGGAGA AATTTTACAA TTCCATGTGA 

=101 TAAGAACTTT CCCTGGTCGA GGAAATGTTA CTGTTAACTG GAAAATTATT 

=151 GGGCAAAATC TAGAACTCAA TTTTGCTAAC TTTAGCGGAC AACTTTTCTT 

1001 TCCTGAGGGG TCGTTGAATA CAACATTGTT TGTGCATTTG TTGGATGACA 

1051 ACATTCCTGA GGAGAAAGAA GTATACCAAG TCATTCTGTA TGATGTCAGG 

55 11D1 ACACAAGGAG TTCCACCAGC C6GAATCGCC CTGCTTGATG CTCAAGGATA 

1151 TGCAGCTGTC CTCACAGTAG AAGCCAGTGA TGAACCACAT GGAGTTTTAA 

1201 ATTTTGCTCT TTCATCAAGA TTTGTGTTAC TACAAGAGGC TAACATAACA 

1251 ATTCAGCTTT TCATCAACAG AGAATTTGGA TCTCTAGGAG CTATCAATGT 
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1301 CACATATACC ACGGTTCCTG 

1351 GAAACCTAGC AGAGCCAGAA 

1M01 ATTTTAGAAG AAGGG6AAAC 

msi GGATGATGTA CCAGAGCTAG 

5 1501 TTGGACTTAC CATGGCTGCT 

1551 GAAGGTTTGA CTGCACAAGT 

1L.D1 TGTAATTGAA TGGCAACAAA 

lt»51 GTTTAACATT GGTAGCCCAG 

1701 TTATTTGTGT ATGCTCAGAA 

10 1751 CTTCACCCCA ATGATTCTTC 

lflOl TCAATATCAT GATTCTTGAT 

lfiSl CAGCTGATTT TAACAAATCC 

1=501 AATAGCCTTA ATTATTGTCC 

nSl CATTTAACAA CAGTGAGCAC 

15 2001 GTCCAGGAGA GTGTTGCAGT 

2051 ATTGTTTGGA ACAGTGACAG 

5101 CAAATGAATC TAAAGATCTG 

5151 GAAGGTGTTC GATTCAAGGC 

5501 ACCAGAAATG GATGAGTATT 

20 5551 GTGCTAGACT AGGGGTGCAT 

5301 CAGGCCCCTT TGGGGCTATT 

5351 CTCCATAGAC ATCGAAGAAG 

2401 6AACTAATGG CATTGATTTG 

5i<Sl GAAACAGCCT TTGGCATGAG 

25 5501 AAGTTTTTTG GATGAATCAG 

5551 ATTTAATATA TGGTATAATG 

2b01 TGGCAGGGGA TTTTTATTCC 

5b51 AACTTGTGAG GCCTTTAATA 

5701 ATGAAGAAAG AAATGAAGAA 

30 2751 ACATCTGGAT TTAAATTATT 

5A01 TTCTCAAGTA AGATATTTTA 

5B51 CAAGTCAAAG AGATGATTCC 

2101 GGAAGCTTCG TGTTGCATCA 

2=151 GGCCTTGTTC AACAAGGGAG 

35 3001 ATGCCAGGCT AAACTCCCTT 

3051 AACTTTCAAG AGGTGCCTGT 

3101 TTCAGCCAAT GATATTTACC 

3151 ATCAGAATTC AATTGATATT 

3201 AGGTATTTTC AGTCTGTAGA 

40 3551 CACACCAGCC TCAGGAATAG 

3301 CTGCTCTTTA CTGCTGGAAT 

3351 GAAGTACCTT CTGCTTATGA 

31101 TTCAAGCAAG AATTTAATAG 

3451 AGCTAGCCTA CATTTCCAGC 

45 3501 CTGATATTTG AACCTGGTGA 

3551 TGATGATACA GTTCCAGAAA 

3b01 ATCCCAAAGG AGGAGCAGAG 

3bSl ATTCTGTCTA ATGATGATGC 

3701 ATTATATAAG CAAGT6GAAG 

50 3751 ACGTTGAACG CTTAAAAGGA 

3601 GCTGATGGAA GTATTAGTGA 

3A51 TACTGAAGGC CAGGTACTGT 

3101 ATATACCAGA GTTATCAGAG 

3151 ACAGAAGGGG TTGAGGACTC 

55 H001 AAGCAAGTCT GTTATAACAA 

4051 TGGGCTGGCG TGCTGCGTCT 

4101 AACACCACCA CTCTTCAGTT 

4151 GGATATTGCC ATTCACTTGA 



GAATGCTGAG TCTGAAGAAC CAAACAGTAG 
GTTGATTTTG TCCCTATCAT TGGCTTTCTG 
AGCAGCAGCC ATCAACATTA CCATTCTTGA 
AAGAATATTT CCTGGTGAAT TTAACTTACG 
TCAACTTCAT TTCCTCCCAG ACTAGATTCA 
TATTATTGAT GCCAATGATG GGGCCCGAGG 
GCAGGTTTGA AGTAAATGAA ACCCATGGAA 
AGGAGCAGAG AACCTCTTGG CCATGTTTCC 
TTTGGAAGCA CAAGTGGGGC TGGATTATAT 
ATTTTGCTGA TGGAGAAAGG TATAAAAATG 
GATGACATTC CAGAAGGAGA TGAAAAATTT 
TTCTCCTGGA CTAGAGCTAG GGAAAAATAC 
TTGCTAATGA TGACGGCCCT GGAGTTCTAT 
TTTTTCCTAA GAGAGCCAAC AGCTCTCTAC 
ATTGTACATT GTTCGGGAAC CTGCACAAGG 
TTCAGTTCAT TGTGACAGAA GTGAATTCCT 
ACTCCTTCCA AAGGCTATAT TGTTTTAGAA 
CCTACAAATA TCTGCCATAT TAGACACGGA 
TTGTTTGCAC CTTGTTTAAT CCAACTGGA6 
GTTCAAACCC TGATAACAGT TTTGCAAAAC 
CAGTATCTCT GCAGTTGAAA ATAGAGCCAC 
CCAATAGGAC CGTGTATTTA AATGTATCTC 
GCTGTGA6TG TGCAGTGGGA GACAGTATCT 
GGGAATGGAT GTTGTGTTTT CCGTATTTCA 
CTTCTGGCTG GTGTTTCTTT ACTTTGGAAA 
TTAAGAAAAT CATCTGTTAC TGTTTACCGA 
AGTTGAGGAT TTAAATATAG AAAATCCTAA 
TTGGTTTTTC TCCCTACTTT GTGATTACTC 
AAGCCTTCTC TTAACAGTGT GTTTACATTC 
CCTGGTACAA ACAATCATTA TTCTGGAAAG 
CTTCAGACAG CCAAGATTAT TTAATCATTG 
GAATTAACTC AGGTCTTCAG GTGGAATGGA 
AAAACTCCCT GTCCGAGGTG TGCTGACCGT 
GCTCTGTGTT CTTAGCCATT TCCCAGGCTA 
TTATTCAGAT GGTCTGGCAG TGGGTTTATT 
CAGTGGGACA ACAGAAGTTG AGGCTTTGTC 
TAATATTTGC CAAAAATGTC TTTCTAGGAG 
TTCATCTGGG AGATGGGACA GTCTTCCTTC 
TTTTGCTGCT GTTAACAGAA TCCACTCCTT 
CCCACATACT TCTTATTGGC CAAGATATGT 
TCGGAGCGTA ATCAATTCTC TTTTGTTCTG 
TGTGGCTTCT GTTACAGTAA AGTCCCTTAA 
CTCTAGTGGG AGCTCATTCA CATATATATG 
CATTCTGACT TTATTCCTAG TTCAGGTGAA 
GAGAGAAGCT ACAATAGCAG TAAATATCCT 
AAGAAGAATC CTTCAAAGTT CAACTTAAAA 
ATTGGCATTA ATGATTCTGT AACAATAACC 
CTATGGAATT GTTGCATTTG CTCAGAATTC 
AAATGGAGCA AGATAGCCTA GTAACCTTGA 
ACATATGGCC GTATAACCAT AGCATGGGAA 
TATATTTCCT ACCTCAGGAG TGATTTTATT 
CAACAATCAC TCTAACTATT CTTGCTGATA 
GTTGTGATTG TAACCCTCAC CC6TATCACC 
ATACAAAGGT GCTACTATTG ATCAGGACAG 
CTTTGCCCAA TGACTCACCT TTTGGCTTGG 
GTCTTCATTA GAGTAGCAGA GCCTAAAGAA 
ACAAATAGCT CGAGATAAAG GACTACTTGG 
GA6CTCAACC CAATTTCTTA CTGCATGTCG 
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4501 ATAATCAAGC TACTGAGAAT 

4551 ATAATGAAAG AAAACATAAA 

4301 GGATGACCTT CCTGAATTGG 

4351 TGAACCTGGT GAACTCTGAC 

5 4401 CCCGGAATGG AAATAGCTGA 

4451 AGGAATTTTT ATGTTTCATG 

4501 CCTATGAGGT GCCTCCACCC 

4551 CTGGCTGGAA GCTTTGGGGC 

4L01 CAGTGCTGGC CTGGAAGACT 

10 4L.51 CAGATAAACA GGTTACTGCA 

4701 GAATTTGAAT TGACAGAGAC 

4751 AGGTGGCAGA CTTGGTGATG 

4fl01 ATGATTCTCC ATTTGGAGTA 

4651 ACATATCAGG GGAAAGCCTT 

15 4^01 GTAGAAAGTG TCTCACATTT 

4151 GCTCATGCCA GTAATCCCAG 

5001 GAGGTCAGGA GATTGACACC 

5051 CTACTGAAAG TACAAAAATT 

5101 CCAGATACTT G6GAGGCTGA 

20 5151 GAGGTTGCAG TGAGCTGAGA 

5S01 AGAGAGACTC CATCTCAAAA 



GAAGATTATG TATTGCAAGA AACAATAATA 
AGAAGCTCAT GCCGAAGTTT CCATTTTGCC 
AGGAAGGATT TATTGTCACT ATCACTGAGG 
TTCTCTACAG GACAGCCAAG TGTGCGGAGG 
GATAATGATA GAAGAAAATG ACGATCCCAG 
TTACTAGAGG CGCTGGGGAA GTTATTACTG 
TTGAACGTTC TTCAAGTTCC TGTAGTCCGG 
AGTAAATGTT TATTGGAAAG CATCACCAGA 
TTAAACCATC TCATGGGATT CTTGAATTTG 
ATGATAGAAA TCACCATAAT TGATGATGCT 
GTTCAATATT TCCTTGATCA GTGTTGCTGG 
ATGTTGTGGT AACTGTTGTT ATTCCACAAA 
TTTGGATTTG AAGAAAAGAC TGTAAGTTAA 
GTTTCAGGCT AGCGTTTCAT GTAATTTTGA 
TTGTTTTGGA AGTCTTGGCC AGGCATGGTG 
CACTTTGGGA GGCCGCAGCG GGCAGATCAC 
ATCCTGGCCA ATATGGTTGA ATTCCCGTCT 
AGCTGGGCGT GGTGGCACAT GCCTGTATTC 
G6CAGGAGAC TCGCTTGAAC CCAG6AG6CA 
TCACGCCATT GCACTCCAGC CTGGCGACAT 
AAAAAAAAAA AAAAAG 



BLAST Results 

25 

No BLAST result 



30 Medline entries 



No Medline entry 

35 

Peptide information for frame 3 



40 ORF from 0 bp to 4A47 bp; peptide length: Iblb 
Category: putative protein 

Classification: Cell signaling/communication 
Prosite motifs: I1ULTIC0PPER_0XIDASE1 (151-171) 

45 

1 DAWADAUALY TCATLCLKEU ACSAFSFFSA SEGPflCFUPIT SUISPAVNNS 

51 DFUTYRKNMT RVASLFSGflA VAGSDYEPVT R<2UAIM<2EG» EFANLTVSIL 

101 PDDFPEPIDES FLISLLEVHL MNISASLKNfl PTIGfiPNIST VVIALNGDAF 

151 GVFVIYSISP NTSEDGLFVE Vl!E<3P<3TLVE LMIHRTGGSL GflVAVEURVV 

50 501 GGTATEGLDF IGAGEILTFA EGETKKTVIL TILDDSEPED DESIIVSLVY 

ESI TEGGSRILPS SDTVRVNILA NDNVAGIVSF QTASRSVIGH EGEILflFHVI 

301 RTFPGRGNVT VNWKIIGfiNL ELNFANFSG<2 LFFPEGSLNT TLFVHLLDDN 

351 IPEEKEVYflV ILYDVRTtfGV PPAGIALLDA flGYAAVLTVE ASDEPHGVLN 

401 FALSSRFVLL (JEANITIdJLF INREFGSLGA INVTYTTVPG MLSLKN(2TVG 

55 451 NLAEPEVDFV PIIGFLILEE GETAAAINIT ILEDDVPELE EYFLVNLTYV 

501 GLTMAASTSF PPRLDSEGLT AflVIIDANDG ARGVIEU(3(3S RFEVNETHGS 

551 LTLVAflRSRE PLGHVSLFVY A(2NLEA<2VGL DYIFTPMILH FADGERYKNV 

bOl NIMILDDDIP EGDEKF<2LIL TNPSPGLELG KNTIALIIVL ANDDGPGVLS 

-29- 



WO 01/98454 



PCT/IB01/02050 



bSl FNNSEHFFLR EPTALYVflES 

7D1 NESKDLTPSK GYIVLEEGVR 

751 ARLGVHVflTL ITVLflNflAPL 

601 TNGIDLAVSV (2UETVSETAF 

5 6S1 LIYGINLRKS SVTVYRbJUGI 

=101 EERNEEKPSL NSVFTFTSGF 

T51 SflRDDSELTfi VFRUNGGSFV 

1DD1 ARLNSLLFRU SGSGFINFflE 

1051 flNSIDIFIUE riGiJSSFRYFfl 

10 1101 ALYCUNSERN (3FSFVLEVPS 

1151 LAYISSHSDF IPSSGELIFE 

1201 PKGGAEIGIN DSVTITILSN 

1251 VERLKGTYGR ITIAldEADGS 

1301 IPELSEVVIV TLTRITTEGV 

15 1351 GURAASVFIR VAEPKENTTT 

1M01 NflATENEDYV L<2ETIIIHKE 

1M51 NLVNSDFSTG CPSVRRPGME 

1501 YEVPPPLNVL CVPVVRLAGS 

1551 DKUVTAHIEI TIIDDAEFEL 

20 IbOl DSPFGVFGFE EKTVS 



VAVLYIVREP AflGLFGTVTV (3FIVTEVNSS 
FKALfllSAIL DTEPEHDEYF VCTLFNPTGG 
GLFSISAVEN RATSIDIEEA NRTVYLNVSR 
GMRGriDVVFS VFflSFLDESA SGUCFFTLEN 
FIPVEDLNIE NPKTCEAFNI GFSPYFVITH 
KLFLVflTIII LESSflVRYFT SDSQDYLIIA 
LHflKLPVRGV LTVALFNKGG SVFLAISflAN 
VPVSGTTEVE ALSSANDIYL IFAKNVFLGD 
SVDFAAVNRI HSFTPASGIA HILLIGflDMS 
AYDVASVTVK SLNSSKNLIA LVGAHSHIYE 
PGEREATIAV NILDDTVPEK EESFKVC2LKN 
DDAYGIVAFA ANSLYKflVEE MECDSLVTLN 
ISDIFPTSGV ILFTEGfiVLS TITLTILADN 
EDSYKGATID ODRSKSVITT LPNDSPFGLV 
LflLfllARDKG LLGDIAIHLR AflPNFLLHVD 
NIKEAHAEVS ILPDDLPELE EGFIVTITEV 
IAEII1IEEND DPRGIFHFHV TRGAGEVITA 
FGAVNVYUKA SPDSAGLEDF KPSHGILEFA 
TETFNISLIS VAGGGRLGDD VVVTVVIPflN 



BLASTP hits 

25 

No BLASTP hits available 

Alert BLASTP hits for DKFZphamy2_10p7-, frame 3 

30 TREMBL : AF0550flM_l gene: "VLGRl n i product: "very large G-protein 
coupled 

receptor-l"i Homo sapiens very large G-protein coupled receptor- 
1 

(VLGR1) mRNAi complete cds--. N = 3-, Score = 2SM-, P = l-Ee-33 



35 



40 



50 



TREMBL ^MAFIAI?..! gene: "Calx"i product: "CALX", Drosophila 
melanogaster 3Na ( + )-lCa (5+) exchanger (Calx) mRNAi complete cds.-, 
N = 

li Score = 17S-. P = S-Se-OT 



>TREMBL : AF0550flM_l gene: "VLGRl n i product: "very large G-protein 
coupled 

receptor-l"i Homo sapiens very large G-protein coupled 
45 receptor-1 CVLGR1) 

mRNAi complete cds* 

Length = l-^b? 



HSPs: 

Score = 2AM (M2-b bits)-. Expect = l-2e-33i Sum P(3) = l-2e-33 
Identities = 112/736 (2b*)-, Positives = 3m/73fl (122) 



(Juery: b? 

55 SG<3AVAGSDYEPVTR(2ldAIM(2EGl>EFANLTVSILPI>DFPEnDESFLISLLEVHLI"lNISAS 12b 

S + G])Y+ a G + + +SI+ P+ E +E +E+ 

L + 
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10 



Sbjct: IDE SSASPGGVDYI-LHGSTVTFflHGflNLSFINISIIDDNESEFEEP 

IEILLTGATGG 155 

fluery: 127 

LKN(3PTIG(2PNISTVVIALN6DAF6VFVIYSISPNTSEI>GLFVEV(aEi2P(2TLV-ELniHR 185 

+G+ +S ++IA + FGV N S+ + + 

T++ L++ R 

Sbjct: ISb A VLGRHLVSRIIIAKSDSPFGVIRFL NflSK 

ISIANPNSTMILSLVLER 2D3 



fluery: lAb TGGSLGflVAVEURVVGGTATEGL DFIG-AGEILTFAEGETK- 

KTVILTXXXXXXX 23B 

TGG LG++ V 111 VG + E L D + F EGE 

+T+ILT 
15 Sbjct: SQM 

TGGLLGEIflVNUETVGPNSflEALLPflNRDIADPVSGLFYFGEGEGGVRTIILTIYPHEEI 2b3 

fluery: 231 XXXXXXXXXLVYTEGGSRILPSSDTVRVNILANDNVAGIVSF-- 

(3TASRSVIGH EG 212 

20 L +G +++ + V + I + G+V F +T S+ 

EG 

Sbjct: EbM 

EVEETFIIKLHLVKGEAKLDSRAKDVTLTIfiEFGDPNGVVflFAPETLSKKTYSEPLALEG 323 

25 fluery: 213 EILflFHVIRTFPGR-GNVTVNUKIIGfl- 
NLELNFANFSGflLFFPEGSLNTTLFVHLLDDN 3S0 

+L +R G G + V U++ + ++ +F + SG +G + 

VHLL D 

Sbjct: 324 

30 PLLITFFVRRVKGTFGEIMVYWELSSEFDITEDFLSTSGFFTIADGESEASFDVHLLPDE 3A3 
Query- 351 

IPEEKEVYflVILYDVRTflGVPPAGIALLMflGYAAVLTVEASDEPHGVLNFAL-SSRFVL 401 
+PE +E Y + L V G A LD + +V A+I>+PHGV 

35 FAL S R + 

Sbjct: 384 VPEIEEDYVIflLVSVE GG AELDLEKSITUFSVYANDDPHGV — 

FALYSDRflSI 434 

fluery: HID LflEANI — TlflLFINREFGSLGAINVTYTTVPGMLSLKNflT- 
40 VGNLAEPEVDFVPIIGFL 4bb 

L N+ +1(3+ IRG+G + V K fl V AE + 

L 

Sbjct : 435 LIGflNLIRSIfllNITRLAGTFGDVAVGLRISSDH KEflPIVTENAERfl — 

L 4fl2 

45 

fluery: 4b7 

ILEEGETAAAINITILEDDVPELEEYFLVNLTYVGLTMAASTSFPPRLDSEGLTAtJVIID 52b 
++++G T +1 LF + LVL PLE 

+ A V+ 

50 Sbjct: ^a3 VVKDGATYKVDVVPIKNflVFLSLGSNFTLflLVTVMLVGGRFYGIIPTILfl- 
EAKSA-VLPV S4D 

fluery: 527 ANDGARGVIEWflflSRFEV-NETHGSLTLVAflRSREPLGHVSLFV- — 
YAflNLEAflVGLDY 562 
55 + A + ++ + F++ N T G+ ++ R R G +S+ YA 

LE + 

Sbjct: 541 SEKAANSfiVGFESTAFflLMNITAGTSHVMISR- 
RGTYGALSVAUTTGYAPGLEIPEFIVV 511 
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fluery: Sfi3 -IFTPMI-- 

LHFADGERYKNVNIMILDDDIPEGDEKFflLILTNPSPGLELGKNTIALIIV b3=i 

TP + L F+ GE+ K V + P E F L L+ G 

5 + IV 

Sbjct: bOO GNllTPTLGSLSFSHGEflRKGVFLlilTFPS" 
PGIdPEAFVLHLSGVflSSAPGGAflLRSGFIV b5? 

fluery: b40 LANDDGPGVLSFN- 
10 NSEHFFLREPTALYVflESVAVLYIVREPAflGLFGTVTVflFIVTEVN b=!6 

A + GV F+ . + S + + E T + ++ V L+ G + 

T 

Sbjct: bSfl -AEIEPPIGVFflFSTSSRNIIVSEDTflll-IRLHVflRLF 

GFHSDLIKVSYflTTAG 70S 

15 

fluery: b=l=i SSNESKDLTP-SKGYIVLEEGVRFKALfllSAILDTEPEMDEYFVCTL 

FNP 74? 

S+ +D P G + ++ +1+ I D E++E+F L 

F+ 

20 Sbjct: 70=i 

SAKPLEDFEPVflNGELFFflKFflTEVDFEITIINDflLSEIEEFFYINLTSVEIRGLflKFDV 7b6 

fluery: 746 TGGARLGVHVflT-LITVLflNflAPLGLFSISAVENR-ATSIDIE 

EANRTVYLNVSRT fiOl 

25 RL + +IT+L N G+ IS E A ++D E T 

YL+ S+T 

Sbjct: 7b=J NUSPRLNLDFSVAVITILDNDDLAGfl- 
DISFPETTVAVAVDTTLIPVETESTTYLSTSKT 627 

30 fluery: 602 NGI 604 

I 

Sbjct: 626 TTI 630 

Score = 2bb (3^1 bits)-. Expect = 4.0e-2Si Sum P(3) = 4-0e-2S 
35 Identities = 17S/706 (24*)-, Positives = 30b/?06 (43V.) 

fluery: 131 

PTIGflPNISTVVIALNGDAFGVFVIYSISPNTSEDGLFVEVflEflPflTLVELIIIHRTGGSL ISO 
P IG +1 ++I N +A G+ P + EV+E L+ + 

40 + R G+ 

Sbjct: 3=1 PEIGNISIVRIIII1KNDNAEGII— EFDPKYTA FEVEEDVG- 

LIMIPVVRLHGTY =10 

fluery: l=Jl GtfVAVEURVVGGTATEG- 
45 LDFIGAGEILTFAEGETKKTVILTXXXXXXXXXXXXXXXXLV 24=1 

G V ++ +A+ G +D+I G +TF G+ + ++ 

L 

Sbjct: =)l 

GYVTADFISflSSSASPGGVDYILHGSTVTFflHGflNLSFINISIIDDNESEFEEPIEILLT ISO 

50 

fluery: 250 YTEGGSRILPSSDTVRVNILANDNVAGIVSFflTASRSVIGHEGE — 
ILflFHVIRTFPGRG 307 

GG+ +L R+ I +I>+ G++ F S+ I + IL + 

RT G 
55 Sbjct: 151 GATGGA- 

VLGRHLVSRIIIAKSDSPFGVIRFLNflSKISIANPNSTMILSLVLERTGGLLG 20=J 
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fluery: 30fl NVTVNUKIIGCN LELN — FAN-FSGt2LFFPEGSLNT- 

TLFVHLLDDNIPEEKEVY 35fl 

+ VNU+ +G N L N A+ SG +F EG T+ + + 

E +E + 
5 Sbjct: 210 

ElflVNWETVGPNSflEALLPflNRDIADPVSGLFYFGEGEGGVRTIILTIYPHEEIEVEETF 2b1 

Query: 351 (JVILYBVRTflGVPPAGIALLDAtJGYAAVLTVEASDEPHGVLNFA — 
LSSRFV— LLflE M12 

10 + L+ V+ G A LD++ LT++ +P+GV+ FA LS + 

L E 

Sbjct: 270 IIKLHLVK - 

GEAKLDSRAKJ>VTLTI(2EFGDPNGVV(2FAPETLSKICTYSEPLALE 322 
15 fluery: M13 

ANITIrJLFINREFGSLGAINVTYTTVPGMLSLKNflTVGNLAEPEVDFVPIIGFLILEEGE M72 
+1 F+R G+GIV+ L ++ ++ E DF+ 

GF + +GE 

Sbjct: 323 GPLLITFFVRRVKGTFGEIflVYU ELSSEF — DITE 

20 DFLSTSGFFTIADGE 370 

<3uery: 473 

TAAAINITILEDDVPELEEYFLVNLTYVGLTNAASTSFPPRLDSEGLTAflVIIDANDGAR S32 
+ A+ ++ +L D+VPE+EE +++ L S LD E 

25 + AND 

Sbjct: 371 SEASFDVHLLPDEVPEIEEDYVIflLV 

SVEGGAELDLEKSITUFSVYANDDPH 422 

fluery: S33 GVIEb)(3(2SRFEV- — NETHGSLTLVAflRSREPLGHVS — 
30 LFVYAflNLEAflVGLDYIFTPH 5fl7 

GV R+ S++R GV+L+ + + E + 

+ + 
Sbjct: 423 

GVFALYSDRdSILIG(2NLIRSIi3INITRLAGTFGDVAVGLRISSDHKEtfPIVTENAERdL 4fl2 

35 

Query: Sflfl ILHFADGERYKNVNIMILDDDI — PEGDE-KFtiLILTNPSPGLELGKNTI — 
-ALIIVLA b41 

++ DG YK V+++ + + + G (3L+ G G TI 

A VL 

40 Sbjct: 463 VVK--DGATYK- 

VDVVPIKN(2VFLSLGSNFTL(3LVTVMLVGGRFYGHPTIL<2EAKSAVLP 53*) 

fluery: b42 

N5]>GPGVLSFNNSEHFFLREPTALYV(2ESVAVLYIVREPAflGLFGTVTV<JFIV TE bib 

45 + NS+ F E TA + A ' V +G +G ++V + 

E 

Sbjct: 540 VSEKAA NSflVGF— 

ESTAFflLIINITAGTSHVniSRRGTYGALSVAIJTTGYAPGLE 512 

50 (2uery: b 8 ^ 

VNSSNESKDLTPSKGYIVLEEGVRFKALfllSAILDTEPEflDEYFVCTLFNPTGGARLGVH 75b 
+ ++TP+ G+ G+K++ P EFVL 

A G 

Sbjct: 513 IPEFIVVGNMTPTLGSLSFSHGEC2RKGVFLUTF — 
55 PSPGti)PEAFVLHLSGV<JSSAPGGA<2 b50 

fluery: 757 VUTLITVLflNflAPLGLFSISAVENRATSIDIEEANRTVYLNVSRTNGI— 
DLAVSVflWET SIM 
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+++ V + + P+G+F S +R +1 + E + + L+V R G DL 

+ V + + T 

Sbjct: b51 LRSGFIVAEIE-PMGVF<3FST-SSR~ 
NIIVSEDT<3MIRLHV(2RLFGFHSDL-IKVSYi2T 70S 

5 

duery: filS VSETAFGMRGHDVVFS VFflSFLDE fl3fi 

+ +A+ +V+ F<3 F E 
Sbjct: 70b TAGSAKPLEDFEPV(2NGELFFt2KF(2TE 732 

10 Score = 5Mb (3b. =1 bits)-. Expect = 4.1e-32-> Sum P(3) = M-le-32 
Identities = ^2/336 (27K), Positives = 1S7/33A (4b*) 

<2uery: 511 PPRLDSEGLTAfiVIIDANDGARGVIEU— 
(3USRFEVNETHGSLTLVA(2RSREPLGHVSLF Sbfl 
15 pp + + + ++H ND A G+IE+ + + FEV E G + + R 

G+V+ 

Sbjct: 3fl PPEIGNISIV- 

RIIIUKNDNAEGIIEFDPKYTAFEVEEDVGLIMIPVVRLHGTYGYVTAD lb 

20 fluery: SbT VYAfiNLEAfiVG- 

LDYIFTPMILHFADGERYKNVNIMILDDDIPEGDEKFflLILTNPSPGL b27 

+<2+ A G +DYI + F G+ +NI I+DD+ E +E 

+++LT + G 
Sbjct: ^7 

25 FISQSSSASPGGVDYILHGSTVTFflHGflNLSFINISIIDDNESEFEEPIEILLTGATGGA 15b 
(2uery: b2fi 

ELGKNTIALIIVLANDDGPGVLSFNNSEHFFLREPTALYVflESVAVLYIVREPAdGLFGT bfl7 
LG++ ++ 11+ +D GV+ FN + P S +L +V E 

30 GL G 

Sbjct: 157 VLGRHLVSRIIIAKSDSPFGVIRFLNflSKISIANPN 

STMILSLVLERTGGLLGE 210 

(2uery: bflfl VTVdFIVTEVNSSN ESKDLT-PSKGYIVLEEGVR- 

35 FKALfllSAILDTEPEMDEYFV 741 

+ V + NS +++D+ P G EG + + ++ E 

E++E F+ 
Sbjct: ail 

IflVNUETVGPNSflEALLPtfNRDIADPVSGLFYFGEGEGGVRTIILTIYPHEEIEVEETFI 270 

40 

fluery: 742 CTLFNPTGGARLGVHVflTL-ITVLtfNflAPLGL — FSISAVENRATSIDIE- 
EANRTVYLN 7=17 

L G A+L + + +T+ + PG+F+ ++S+E 

+ 

45 Sbjct: 271 

IKLHLVKGEAKLDSRAKDVTLTIdEFGDPNGVVflFAPETLSKKTYSEPLALEGPLLITFF 330 

fluery: 7=16 VSRTNGIDLAVSVflUETVSETAFGMRGMDVVFSVFfiSFLDESASGWCFFTL 

50 V R G + V UE SE F + + FL S SG FFT+ 

Sbjct: 331 VRRVKGTFGEIHVYUELSSE FDITEDFL--STSG — FFTI 

3bb 

Score = 2Mb Ob-I bits)-. Expect = Lle-lln Sum P<3) = l-^e-n 
55 Identities = 67/303 C2A*)-. Positives = 13S/303 (45*) 

(Juery : llb2 PSSGELIFEPGEREA-TIAVNIODTVPEKEESFKVflLKNPKGGAEIGIN- 
DSVTITILS 121T 
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P SG F GE TI + I E EE+F ++L KG A++ 

VT+TI 

Sbjct: E3b 

PVSGLFYFGEGEGGVRTIILTIYPHEEIEVEETFIIKLHLVKGEAKLDSRAKDVTLTIflE SIS 

5 

fluery: 1E20 NDDAYGIVAFAGNSL 

YKf2VEEHEt2DSLVTLNV"ERLKGTYGRITIAIiJEAI>GSIS 1E7S 

D G+V FA +L Y + +E L+T V R+KGT+G I + UE 

Sbjct: 21b 

10 FGDPNGVVflFAPETLSKKTYSEPLALEGPLLITFFVRRVKGTFGEIMVYlJELSSEFDITE 355 
fluery: 1E73 

DIFPTS6VILFTEG<2VLSTITLTILAJ)NIPELSEVVIVTLTRITTEGVEDSYKGATII><2I) 133S 
J> TSG +G+ ++ + +L D +PE+ E ++ L ++ EG 

15 6A +D + 

Sbjct: 35b DFLSTSGFFTIADGESEASFDVHLLPDEVPEIEEDYVI<JL--VSVEG 

-GAELDLE 407 

fluery: 1333 

20 RSKSVITTLPNDSPFGLVGURAASVFIRVAEPKENTTTLdLfllARDKGLLGDIAIHLRACJ 13TE 

+S + + ND P G+ + I + + ++12+ I R G 

GD+A+ LR 

Sbjct: ^Dfl KSITUFSVYANDDPHGVFALYSDRflSILIGfl— 
NLIRSI<3INITRLAGTFGDVAVGLRIS 4b5 

25 

fluery: 13i3 PNFLLHVDNfl- 

ATENEBYVLflETIIIMKENIKEAHAEVSILPDDLPELEEGFIVTITEVN 1451 

+ H + TEN E +++K+ VI L F 

+ + V 

30 Sbjct: Mbb. SD HKEdPIVTENA 

ERflLVVKPGATYKVDVVPIKNiaVFLSLGSNFTLflLVTVn 517 

fluery: mSE LVNSDFSTGflPSV 14b4 
LV F G P++ 
35 Sbjct: SIS LVGGRFY-GHPTI SET 

Score = £4b (3b-T bits)-. Expect = l.^e-lTi Sum P(3) = 1-le-n 
Identities = a^/334 (Eh*)-. Positives = 150/334 (44*) 

40 fiuery: 1151 

DFIPSSGELIFEPGEREATIAVNILDDTVPEKEESFKVflLKNPKGGAEIGINDSVTITIL lElfl 
D+I + F+ G+ +1 ++I+DD E EE ++ L GGA +G + 

I I 

Sbjct: 110 

45 DYILHGSTVTF(JHG(JNLSFINISIIDI>NESEFEEPIEILLTGATGGAVLGRHLVSRIIIA lb=l 
fluery: 1E1T 

SNDDAYGIVAFA(3NSLYK(3VEEHE<3DSLVTLNVERLKGTYGRITIAIjIEADGSIS 1E7E 

+D +G++ F S + +++L +ER 6 G I + UE G 

50 S 

Sbjct: 170 KSDSPFGVIRFLNflSKIS- 
IANPNSTNILSLVLERTGGLLGEIlJVNUETVGPNSflEALLP EEfl 

fluery: 1S73 DIF-PTSGVILFTEGflV- 

55 LSTITLTILADNIPELSEVVIVTLTRITTEGVEDSYKGA 13E7 

BI P SG+ F EG+ + TI LTI E+ E 1+ L + E 

DS 
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Sbjct: 22=1 

flNRDIADPVSGLFYFGEGEGGVRTIILTIYPHEEIEVEETFIIKLHLVKGEAKLDS 2fi4 

fluery: 132fl TIDflDRSKSVITTLPN-DSPFGLVGURAASVFIRV-AEPK — 
5 ENTTTLflLfllARDKGLLG 13B3 

R+K V T+ P G+V + ++ + +EP E + + 

R KG G 

Sbjct: 2fl5 

RAKDVTLTIflEFGDPNGVVflFAPETLSKKTYSEPLALEGPLLITFFVRRVKGTFG 33T 

10 

fluery: 13fl4 

DIAIHLRAflPNFLLHVDNflATENEDYVLflETIIIMKENIKEAHAEVSILPDDLPELEEGF 1443 
+1 ++ F + ED++ + + EA +V 

+LPD++PE+EE + 

15 Sbjct: 340 EltlVYUELSSEFDI 

TEDFLSTSGFFTIADGESEASFDVHLLPDEVPEIEEDY 311 

fluery: 1444 IVTITEVNLVNSDFSTGflPSVRRPGMEIAEIIIIEENDDPRGIFriFHVTR 

mis 

20 +++ v +++I + NDDP G+F + R 

Sbjct: 312 VlflLVSVE GG AELDLEK — -SITUFSVYANDDPHGVFALYSDR 

431 

Score = 23? (35. b bits)-, Expect = 1.46-34-. Sura P(3) = 1-4e-34 
25 Identities - 101/3b? C27'/.)-. Positives = lb5/3b? (44*/.) 

fluery: b7 

SGflAVAGSDYEPVTRflUAIMflEGDEFANLTVSILPDDFPEflDESFLISLLEVHLflNISAS 12b 
S + GDY+ AG + + +SI+ I>+ E +E +E+ 

30 L + 

Sbjct: 102 SSASPGGVDYI-LHGSTVTFflHGflNLSFINISIIDDNESEFEEP 

IEILLTGATGG 155 

fluery: 127 

35 LKNflPTIGflPNISTVVIALNGMFGVFVIYSISPNTSEDGLFVEVflEflPflTLVELMIHRT Ifib 

+G+ +S ++IA + F6V N S+ + 

++ L++ RT 

Sbjct: 15b A VLGRHLVSRIIIAKSDSPFGVIRFL NflSKISI— 

ANPNSTPIILSLVLERT 204 

40 

fluery: lfl? GGSLGflVAVEURVVGGTATEGL DFIG-AGEILTFAEGETK- 

KTVILTXXXXXXXX 231 

GG LG++ V U VG + E L D + F EGE +T+ILT 

Sbjct: 205 

45 GGLLGEIflVNIrtETVGPNSflEALLPflNRDIADPVSGLFYFGEGEGGVRTIILTIYPHEEIE 2b4 

fluery: 240 XXXXXXXXLVYTEGGSRILPSSDTVRVNILANDNVAGIVSF-- 
flTASRSVIGH EGE 213 

L +G +++ + V + I + G+V F +T S+ 

50 EG 

Sbjct: 2bS 

VEETFIIKLHLVKGEAKLDSRAKDVTLTIflEFGDPNGVVflFAPETLSKKTYSEPLALEGP 324 

fluery: 2=14 ILflFHVIRTFPGR-GNVTVNUKIIGfl- 
55 NLELNFANFSGflLFFPEGSLNTTLFVHLLDDNI 351 

+L +R G G+V U++ + ++ +F + SG +G + 

VHLL D + 
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Sbjct: 3ES 

LLITFFVRRVKGTFGEinVYUELSSEFDITEDFLSTSGFFTIADGESEASFDVHLLPDEV 364 
fluery: 352 

5 PEEKEVYflVILYDVRTflGVPPAGIALLDAflGYAAVLTVEASDEPHGVLNFAL-SSRFVLL 410 

PE +E Y + L V G A LI + + V A+D+PHGV FAL 

S R +L 

Sbjct: 3fi5 PEIEEDYVIflLVSVE GGAELDLEKSITUFSVYANDDPHGV-- 

FALYSDRflSIL 435 

10 

fluery: mi (2EANI — TIflLFINREFGSLGAINV 433 

N+ + Ifl+ I R G+ G + V 
Sbjct: 43b IGflNLIRSIfllNITRLAGTFGDVAV MbO 

15 Score = 230 (34-5 bits)-. Expect = 2-3e-14-. Sum P(3) = 2-3e-14 
Identities = 1fl/3ba (SbX)-. Positives = lb4/3bfl (MM*) 

fluery: 1240 EMEflD- 

SLVTLNVERLKGTYGRITIAWEADGSISDIFPTSGvTLFTEGflVLSTITLTILA 1215 
20 E+E+D L+ + V RL GTYG +T + + S + P 6V 6 

ST+T 

Sbjct: 71 EVEEDVGLIIIIPVVRLHGTYGYVTADFISflSSSAS — P-GGVDYILHG 

STVTFflH-G 123 

25 fluery: 1211 DNIPELSEVVIVTLTRITTEGVEDSYKGATIDflDRSKSVITTL 

PNDSPFGLVGURAA 13SS 

N+ ++ +1 E +E GAT + +++ + 

+DSPFG++ + 
Sbjct: 121* 

30 flNLSFINISIIDDNESEFEEPIEILLTGATGGAVLGRHLVSRIIIAKSDSPFGVIRFLNfl 163 

fluery: 13SL, SVFIRVAEPKENTTTLflLfilARDKGLLGDIAIHLRAfl- 
PNFLLHVDNflATENEDYVLflET 1414 

S I+AP +T LL + R GLLG+I ++ PN + fl + 

35 I> V 

Sbjct: 164 SK-ISIANPN- 

STMILSLVLERTGGLLGEIflVNUETVGPNSflEALLPflNRDIADPV — SG 23=1 

fluery: mis IIIMKENIKEAHAEV- 
40 SILPDDLPELEEGFIVTITEVNLVNSDFSTGflPSVRRPGMEIAE 1473 

+ E + +1 P + E+EE FI+ +++LV G+ + 

++ 

Sbjct: 240 LFYFGEGEGGVRTIILTIYPHEEIEVEETFII KLHLVK 

GEAKLDSRAKDVT- 210 

45 

fluery: m?4 

IfllEENDDPRGIFMFHVTRGAGEVITAYEXXXXXXXXXXXXXXXAGSFGAVNVYIilKASPD 1533 
+ I+E DP G+ F + + + G+FG + 

VYU+ S + 
50 Sbjct: 211 

LTIflEFGDPNGVVflFAPETLSKKTYSEPLALEGPLLITFFVRRVKGTFGEIMVYUELSSE 3S0 

fluery: 1534 

SAGLEDFKPSHGILEFADKflVTAniEITIIDDAEFELTETFNISLISVAGGGRLGDDVVV 1513 
55 EPF + G AD + A ++ ++ D E+ E + I L+SV GG 

L + + 
Sbjct: 351 

FDITEDFLSTSGFFTIADGESEASFDVHLLPDEVPEIEEDYVIflLVSVEGGAELDLEKSI 410 
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(Juery: 1514 T-VVIPflNDSPFGVF lb07 

T + ND P GVF 
Sbjct: 411 TUFSVYANDDPHGVF 425 

5 

Score = 11U (2fl-5 bits)i Expect = ?.5e-lli Sum P(3) = 7-Se-ll 
Identities = 13^/511 (23'/.) Positives = 247/511 (Ml*/.) 

fluery: b? 

SGflAVAGSDYEPVTRtJUAIIIlJEGDEFANLTVSILPDDFPEriDESFLISLLEVHLriNISAS 12b 
+G A I>+EPV (2+ + ++I+ D E++E F I+L V 

+ + 

Sbjct: 707 

AGSAKPLEDFEPVflNGELFFfiKFfiTEVDFEITIINDdLSEIEEFFYINLTSVEIRGLflKF 7bb 

fluery: 157 LKN-flPTIGflP-NISTVVIALNGDAFGVFVIY- 
SISPNTSEDGLFVEV<2E<2P(3TLVELMI 183 

N P + +++ +INDG+++ + + D +V++ 

T L 

Sbjct: 7t7 

DVNUSPRLNLBFSVAVITILDNDDLAGMDISFPETTVAVAVDTTLIPVETESTTY — LST fl24 

fluery: 1B4 HRTGGSLGflVAVEURVVGGTATEGLDFIGAGEILTF-- 
AEGETKKTVILTXXXXXXXXXX 241 
25 +T L V +V T G+ I +++T ++K + T 

Sbjct: 325 SKTTTILflPTNVV-AIV — TEATGVSAIPE- 
KLVTLHGTPAVSEKPDVATVTANVSIHGT flflD 

Query: £42 XXXXXXLVYTEGGSRILPSSDTVRVNILANDNVAGIVSF-- 
<2TASRSVIGHEGEIL(3FHV 211 

+VY E + + +T V I G VS +T E 

L F 

Sbjct: flfil FSLGPSIVYIEEEMKN- 

GTFNTAEVLIRRTGGFTGNVSITVKTFGERCAflHEPNALPF — =137 
fluery: 30D 

IRTFPGRGNVTVNUKIIGl3NLELNFANFSG(3LFFPEGSLNTTLFVHLLDDNIPEEKEVY(3 351 
R G N+T Id + E+F +LF+G + V +LDD+ 

PE +E + 

Sbjct: =l3fl -RGIYGISNLT — WAVE 

EEDFEEUTLTLIFLDGERERKVSViJIL]>I>DEPEG(JEFFY 11Q 

fluery: 3bQ VILYDVRT<2GVPPAGIALLDA<2 GYAA-- 

VLTVEASDEPHGVLNFALSSRFVL-LflEA 413 
45 V L + P G +++ + G+AA ++ + SD +G++ F+ S+ 

L L+E 

Sbjct: in VFLTN 

P(2GGAflIVE6KDDTGFAAFAflvTITGSDLHNGIIGFSEESi2SGLELREG 1044 

50 fluery: 414 NITIULFI NREFGSLGAI- 

NVTYTTVPGIILSLKNtJTVGNLAEPEVDFVPIIGFL 4bb 

+ +L + NR F + VT ++ L+ V NL E E+ 

V G 

Sbjct: 1D4S AVnRRLHLIVTRflPNRAFEDVKVFURVTLNKT— VVVH3KDGV-NLME- 
55 ELfiSVS — GTT 1Q1& 

fluery: 4b? ILEEGETAAAINITILEDDVPELEEYFLVNL-- 
TYVGLTMAASTSFPPRLDSEGLTAflVI 524 



10 



15 



20 



30 



35 



40 
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G+T I+I + + VP++E YF V L G + S F 

E +Q + 
Sbjct: lOII 

TCTN6fiTKCFISIELKPEKVPQVEVYFFVELYEATAGAAINNSARFAflIKILESDES(2SL 1156 

5 

(2uery: 5E5 IDANDGARGVIEWflflSRF EVNETHGS- 

LTLVA(2RSREPLGHVSLFVYA<2NLEAl2VGL SflD 

+ + G+R + +++ +V G+ L + S + L 

A G 

10 Sbjct: 1151 

VYFSVGSRLAVAHKKATLISLflVARDSGTGLMnSVNFSTflELRSAETIGRTIISPAISGK 'lElfl 

(3uery: Sfll DYIFTPniLHFADGERYKNVNirilLDD-- 
DIPEGDEKFflLILTNPSPGLELGKNTIALII b3B 
15 D++ T L F G+R +++++ + + ++FA++L +P G + 

K I 

sbjct: isn 

DFVITEGTLVFEPG(3RSTVLDVILTPETGSLNSFPKRF(3IVLFDPKGGARIDICVYGTANI lS7fi 

20 fiuery: b31 VLAND-DGPGVLSFNNSEH b5b 

L +D J> + + H 
Sbjct: 1E71 TLVSDADSCAIUGLADCILH 1517 

Score = Iflfi (ES-E bits)i Expect = LEe-33i Sum P(3) = LEe-33 
25 Identities = 84/331 (E5Z)-. Positives = 14b/3E1 (41*) 

<3uery: 112b SVTVKSLNS 

SKNLIALVGAHSHIYELAYISSHSDFIPSSGELIFEPGEREATIAV llflO 

S+TVK+ N +- G + I L + DF + LIF 

30 GERE ++V 

Sbjct: 117 SITVKTFGERCAflflEPNALPFRGIYG- 
ISNLTWAVEEEDFEEflTLTLIFLDGERERKVSV 175 

fluery: llfll NILDDTVPEKEESFKVflLKNPKGGAEI— GINDS VTITILSNDDAY- 

35 GIVAFAflNS 1E33 

ILDD PE +E F V L NP+GGA+I G +D+ + I++ D + 

GI+ F++ S 
Sbjct: 17b 

fiILDDDEPEGflEFFYVFLTNPfiGGA<2IVEGKDDTGFAAFAMVIITGSDLHNGIIGFSEES 1035 

40 

(3uery: 1E34 LYKdVEEMEflDSLVT LNVERLKG-TYGRITIAWEAD- 

GSISDIFPTSGVILFTEGlJV lEflfl 

+ E+ + +++ LVR + CV 

L E a 
45 Sbjct: lD3b — 

C3SGLELREGAVP1RRLHLIVTR(3PNRAFEI>VKVFURVTLNKTVVVL(3K]>GVNL[1EELi2S 1013 

fluery: 1ES1 LSTITLTILADNIPELS-EVVIVTLTRITTEGVEDSYK— 
GATIDfiDRSKSVITTLPND 1344 
50 +S T + +S E+ + ++ + Y+ GA 1+ + 

I L +D 
Sbjct: 1D14 

VSGTTTCTMGCTKCFISIELKPEKVPfiVEVYFFVELYEATAGAAINNSARFACJIKILESl) 1153 

55 fiuery: 1345 SPFGLVGURAASVFIRVAEPKENTTTL(2L<2IARDKG — LLGDIAI 

HLRAflPNFLLHV 1311 

LV + S R+A + T + Lfl+ARD G L+ + LR+ 

+ 
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Sbjct: 1154 ESflSLVYFSVGS — - 

RLAVAHKKATLISHaVARDSCTGLnHSVNFST(2ELKSAETIGRTI 1510 

fiuery: 1400 DNflATENEDYVLflETIIIflKENIKEAHAEVSILPD 1434 
5 + A +D+V+ E ++ + + +V + P+ 

Sbjct: 1211 ISPAISGKDFVITEGTLVFEPGflRSTVLDVILTPE 12M5 

Score = IBb (2? -I bits)i Expect = 2.5e-13-, Sum P(3) = 2-Se-13 
Identities = 75/242 (3D*)-, Positives = 113/242 (4bX) 

10 

fiuery: 120b 

EIGINPSVTITILSNPDAYGIVAFAcJNSLYKfiVEEriElJDSLVTLNVERLKGTYGRITIAU 12b5 
EIG V I 1+ ND+A GI+ F + Y EE L+ + V RL 

GTYG +T + 

15 Sbjct: i»o EIGNISIVRIIiriKNDNAEGIIEF — 
DPKYTAFEVEEDVGLIfllPVVRLHGTYGYVTADF 17 

Query: 12bb EADGSIS 

DIFPTSGVILFTEGtJVLSTITLTILADNIPELSEVVIVTLTRITTEGV 1350 
20 + S + » + F G(3 LS I ++I+ DN E E + + 

LT T G 
Sbjct: Ifl 

IS(2SSSASPGGVDYILHGSTVTFflHG(2NLSFINISIIDDNESEFEEPIEILLTGAT — G- 151 
25 fluery: 1321 

EDSYKGATI1)(3DRSKSVITTLPN]>SPFGLVGIiIRAASVFIRVAEPKENTTTL(JL(3IARD1CG 1380 

GA + + +1 +DSPFG++ + S I +A P +T L 

L + R G 

Sbjct: 1SS GAVLGRHLVSRIIIA-KSDSPFGVIRFLNfiSK-ISIANPN- 

30 STI1ILSLVLERTGG 20b 

fiuery: 13fll LLGDIAIHLRAQ-PNFLLHVDNUATENEDYVLflETIIIMKENIKEAHAEV- 
SILPDDLPE 1433 

LLG+I ++ PN+fl + DV + E + 

35 +1 P + E 

Sbjct: 20? LLGEIfiVNliJETVGPNSflEALLPtfNRDIADPV-- 
SGLFYFGEGEGGVRTIILTIYPHEEIE 2b4 

(Suery: 1431 LEEGFIVTI 1447 
40 +EE FI+ + 

Sbjct: 2b5 VEETFIIKL 273 

Score = 171 (2b-1 bits)-i Expect = 1-4e-34-. Sum P(3) = L4e-34 
Identities = bS/244 (2b*)-. Positives = 114/244 (4b'/.) 

45 

fiuery: 551 DYIFTPHILHFADGERYKNVNIillLDPBIPEGDEKFULILTNPSPGLEL— 
GKN T b33 

D+ + L F DGER + V++ ILDDD PEG E F + LTNP G ++ 

GK+ 

50 Sbjct: 154 

BFEE(2TLTLIFLDGERERKVSVtJILPl>PEPEG(2EFFYVFLTNP(3GGA(2IVEGKI>I>TGFAA 1013 
Query: b34 IALIIVLANDDGPGVLSFNNSEHFFLREPTALYV(3ESVAVLYIVREPA(2G — 

— lfgtv baa 

55 A++I+ +D G++ F+ L ++ L + R+P + 

+F V 

Sbjct: 1D14 FAMVIITGSDLHNGIIGFSEESflSGLELREGAVnRR— 
LHLIVTRflPNRAFEDVKVFbJRV 1071 

-40- 



WO 01/98454 



PCT/IB01/02050 



(Juery: bflT TV<J — 

FIVTEVNSSNESKDLTPSKGYIVLEEGVRFKALfilSAILDTEPEriDEYFVCTLFN 7Mb 

T+ +V + + N ++L G G + I + P+++ 

5 YF L + 

Sbjct: 1072 

TLNKTVVVLflKDGVNLMEELflSVSGTTTCTriGlSTKCFISIELICPEKVPdJVEVYFFVELYE 1131 

(Juery: 7M7 PTGGARLGVHVfl- 
10 TLITVL(JN(JAPLGLFSISA VENRATSIDIEEANRTVYLNVSRTNGID ADS 

T GA + + I +L++ L S V +R ++ ++A + L 

V+R +G 

Sbjct: 1135 ATAGAAINNSARFA(JIKILESDES(JSLVYFS-VGSRL-AVAHKKAT- 
LISLiJVARDSGTG llflfl 

15 

(Juery: 30b LAVSViJWET am 

L +SV + T 
Sbjct: liaT LflflSVNFST 11^7 

20 Score = 17M <2b-l bits)-. Expect = H-le-35-. Sum P(3) = M-le-32 
Identities = 53/200 (21J:)i Positives = 102/200 (51X) 

(Juery: 1151 

DFIPSSGELIFEPGEREATIAVNILDDTVPEKEESFKVtJLKNPKGGAEIGINDSVT-ITI 1217 
25 DF+ +SG GE EA+ V++L D VPE EE + +(JL + +GGAE+ + 

S+T ++ 
Sbjct: 35b 

DFLSTSGFFTIADGESEASFDVHLLPDEVPEIEEDYVIlJLVSVEGGAELDLEKSITIilFSV MIS 

30 (2uery: 121fl LSNDDAYGIVAFA(JNSLYKiJVEEI1E<JDSL" 
VTLNVERLKGTYGRITIAUEADGSISDIF 127S 

+NDD +G+ A + +(J + (3+ + + +N+ RL GT+G + + 

SD 

Sbjct: mb YANDDPHGVFALYSD 

35 R(JSILIG(JNLIRSI(2INITRLAGTFGDVAVGLRIS SDHK tbi 

(Juery: 127b PTSGVILFTEGtJVLSTITLTILADNIPELSEVVI 

VTLTRITTEGVEDSYKGA-TI 1321 

V E (J++ T D +P ++V + TL +T V 

40 + G TI 

Sbjct: H7Q 

EflPIVTENAERflLVVKDGATYKVDVVPIKNflVFLSLGSNFTLiaLVTVriLVGGRFYGriPTI 521 

(Juery: 1330 DdDRSKSVITTLPNDSPFGLVGURAAS 135b 
45 (J + +KS + + + VG+ + + 

Sbjct: 530 LflE-AKSAVLPVSEKAANSUVGFESTA 555 

Score = ItS (21. a bits)-, Expect = "4.3e-2i|i Sum P(3) = M-3e-2i» 
Identities = 10M/31b (2b*)-, Positives = 170/31b (12/C ) 

50 

(Juery: afi 

EGDEFANLTVSILPDDFPEMDESFLISLLEVHLMNISASLKNdPTIGiJPNISTVVIALNG 1H7 
+G+ A+ V +LPD+ PE++E ++I L+ V A L + +1 + 

+ N 

55 Sbjct: 3ba DGESEASFDVHLLPDEVPEIEEDYVltJLvSVEG GAELDLEKSI 

TUFSVYAND 111 
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Query- 14fl 

DAFGVFVIYSISPNTSEDGLFVEVflEflPflTLVELMIHRTGGSLGflVAVEIilRVVGGTATEG SO? 

P 6VF +YS D + + + +++ I R G+ G VAV R+ 

+ 

5 Sbjct: 420 DPHGVFALYS - 

DRflSILIGflNLIRSIfllNITRLAGTFGDVAVGLRISSDHKEflP 472 

(2uery: 20fi LDFIGAGEILTFAEGETKKTVILTXXXXXXXXXXXXXXXXLVYTE-GGSRI- 
-LPSS-DT 2b3 

10 + A L +G T K ++ LV G R 

+P+ 

Sbjct: 473 

IVTENAERlJL VVKDGATYKVDVVPIKN(2VFLSLGSNFTL(3LVTVriLVGGRFYGriPTIL(JE 532 

15 fluery: 2b4 VRVNIL-ANDNVAGI-VSFflTASRSVIGHEGEILflFHVIRTFPGR- 
GNVTVNUKI-IGfiN 311 

+ +L ++ A V F++ + ++ HV+ + G G ++V 

Id 

Sbjct: S33 AKSA VLPVSEKAANSflVGFESTAFflLMNITAGTS — 
20 HVIHSRRGTYGALSVAUTTGYAPG 51Q 

fluery: 320 LEL — — 

NFANFSGflLFFPEGSLNTTLFVHLLDDNIPEEKEVYflVILYDVRTflGVPP 372 . 

LE+ N G L F G +F + P E + + L 

25 V++ P 

Sbjct: 511 LEIPEFIVVGNIITPTLGSLSFSHGEflRKGVFLUTFPS-— 
PGUPEAFVLHLSGVflSSA— P b4b 



fluery: 373 

30 AGIALLDAflGYAAVLTVEASDEPHGVLNFALSSRFVLLflEANITIflLFINREFG-SLGAI 431 

G L G+ + A EP GV F+ SSR +++ E I+L + R 

FG I 

Sbjct: b47 GGAflL — RSGF 

IVAEIEPMGVFflFSTSSRNIIVSEDTflMIRLHVflRLFGFHSDLI bll 

35 

fluery: 432 NVTYTTVPGtlLS-LKN-flTV — GNLA EPEVDF- 

VPIIGFLILEEGETAAAINITIL 432 

V+Y T G L++ + V G L + EVDF +11 LEE 

IN+T + 

40 Sbjct: 700 KVSYflTTAGSAKPLEDFEPVflNGELFFflKFflTEVDFEITIINDfl- 
LSEIEEFFYINLTSV 75fl 

fluery: 4fl3 E 463 
E 

45 Sbjct: 751 E 751 



Score = 142 (21-3 bits)i Expect = S-be-OS-. Sum P(3) = S-be-05 
Identities = 54/175 (30*)i Positives = 7b/17S C43X) 

50 fluery: 1435 

DLPELEEGFIVTITEVNLVNSDFSTGflPSVRRPGPIEIAEiniEENBDPRGIFnFHVTRGA 1414 
DL + G+ TI E N + D flP +1 I+I +ND+ 61 

F 

Sbjct: lb DLYDFGRGYDFTIflE-NGLflID flPP- 

55 EIGNISIVRIIIflKNDNAEGIIEFDPK- — bb 



fluery: 1415 GEVITAYEXXXXXXXXXXXXXXXAGSFGAVNVYU— 
KASPDSAGLEDFKPSHGILEFADK 1552 
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TA+E G++G V + ++S S G D+ 

+ F 

Sbjct: b7 — 

YTAFEVEEDVGLIMIPVVRLHGTYGYVTADFISflSSSASPGGVDYILHGSTVTFflHG 123 

5 

fluery: 1553 

CJVTAfllEITIIDDAEFELTETFNISLISVAGGGRLGDDVVVTVVIPfiNDSPFGVFGF lbOl 
a + I I+IIDD EE E I L GG LG +V ++I 

++DSPFGV F 
10 Sbjct: 124 

(2NLSFINISIIDDNESEFEEPIEILLTGATGGAVLGRHLVSRIIIAKSDSPFGVIRF ISO 



15 



Score = 125 (lfl-6 bits)-. Expect = 4.0e-25-. Sum P(3) = 4-0e-25 
Identities = 77/30A (25*)-, Positives = 134/30A (43*) 



fluery: 1141 LVGAHSHIYELAYISSHS DFIP- 

SSGELIFEPGEREATIAVNILDDTVPEKEES 1113 

L fi HS + +++Y ++ I>F P +GEL F+ + E + I ++D 

+ E EE 
20 Sbjct: bll 

LFGFHSDLIKVSYflTTAGSAKPLEDFEPVlJNGELFFflKFQTEVDFEITIINDQLSEIEEF 750 

(2uery: 1114 FKVfiLKNP-- 

KGGAEIGINDSVTITILSNDDAYGIVAFAflNSLYKflVEEMEcJDSLVTLNV 1251 
25 F + L++G + +NS++D+++ N 

D L +++ 

Sbjct: 751 FYINLTSVEIRGLflKFDVNUSPRLNL DFSVAVITILDN 

DDLAGMDI 71b 

30 fluery: 1252 

ERLKGTYGRITIAWEADGSISDIFPTSGVILFTEGtJVLSTITLTILADNIPELSEVVIVT 1311 

++ T+A B+++ S LT + ++ T+ +E 

+ V + 

Sbjct: 717 SFPETTVAVAVITTLIPVETESTTYLSTS- 

35 KTTTILflPTNVVAIVTEATGVSAIP fl50 

Query: 1312 

LTRITTEGVEDSYKGATIDdDRSKSVITTLPNDSPFGLVGWRAASVFIRVAEPKENT-TT 1370 

+T G T V T N S 6 + V+I E 

40 K T T 

Sbjct: fl51 EKLVTLHG TPAVSEKPDVATVTANVSIHGTFSLGPSIVYIE- 

EEMKNGTFNT 101 

fluery: 1371 LflLfllARDKGLLGDIAIHLRA fiPNFL LHVDNfl — 

45 ATENEDYVLflETI 1415 

++ I R G G+++I ++ +PN L + N A E 

ED+ <2 

Sbjct: 102 

AEVLIRRTGGFTGNVSITVKTFGERCAflMEPNALPFRGIYGISNLTUAVEEEDFEEflTLT Ibl 

50 

fluery: 141b IIMKENIKEAHAEVSILPDDLPELEEGFIVTIT 144S 

+1 + +E V IL DD PE +E F V +T 
Sbjct: 1b2 LIFLDGERERKVSVCILDDDEPEGlJEFFYVFLT 114 

55 Score = 123 Clfl.5 bits)-. Expect = t..0e-2fi-, Sum P(3) = b-0e-2a 
Identities = 11/372 (24*)-. Positives = 150/372 (40*) 
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fluery: 3flfe. VLTVEASDEPHGVLNFALSSRFVLLfiEA-- NITI— 
(3LFINREFGSLGAINVTYTTV-- 43fl 

V TV A+ HG F+L V ++E NT ++ I R G G 

+++T T 

5 Sbjct: atfl VATVTANVSIHGT-- 

FSLGPSIVYIEEEMKNGTFNTAEVLIRRTGGFTGNVSITVKTFGE =125 

fiuery: 43=1 PGMLSLKN-flTVGNL— 

AEPEVDFVPIIGFLILEEGETAAAINITILEDDVPEL 4fl=l 
10 P L + + NL A E DF LI +GE +++ 

IL+DD PE 
Sbjct: 12L, 

RCAflMEPNALPFRGIYGISNLTUAVEEEDFEEflTLTLIFLDGERERKVSVfllLDDDEPEG IfiS 

15 fluery: MID EEYFLVNLTYVGLTHAASTSFPPRLDSEGLTA-- (2VIIDANDGARGVI— 
EUfiflSRFEV 544 

+E+F V LT D G A VII +D G+I E 

as E+ 

Sbjct: ^flh fiEFFYVFLT 

20 NPflGGAfllVEGKDDTGFAAFAnVIITGSDLHNGIIGFSEESdSGLEL 1041 

fiuery: SMS NE — THGSLTLVAflRS-REPLGHVSLF— 
VYAflNLEAflVGLDYIFTPIIILHFADGERYKN 5=1=1 

E LL+R V+FV +D+ |_ 

25 G 

Sbjct: 1042 

REGAVNRRLHLIVTRi2PNRAFEDVKVFURVTLNKTVVVL(20GVNLf1EEL£2SVSGTTTCT 1101 

(Juery: L.00 VNiniLDDDIPEGDEKFtJLILTNPSPGLELGKNT- 

30 IALIIVLANDDGPGVLSF bSl 

++I + + +P+ +F+L +G++ AI+L 

+D+ ++ F 
Sbjct: HQS 

MGflTKCFISIELKPEKVPfiVEVYFFVELYEATAGAAINNSARFAfllKILESDESfiSLVYF llbl 

35 

fluery: b52 NNSEHFFLREPTALYV(3ESVAVLYIVREPA(3GLFGTVTVi2FIVTEVNSSNE- 
-SKDLTPS 701 

+ + A + L + R+ GL ++V F E+ S+ 

++P+ 

40 Sbjct: Hb2 SVGSRLAVAHKKATLIS LCVARDSGTGLM— 

MSVNFSTflELRSAETIGRTIISPA 1214 

fluery: 710 K6YIVLEEGVRFKAL<2ISAILD 731 

K +++ E + F+ as +LD 
45 Sbjct: 121S ISGKDFVITEGTLVFEPGflRSTVLD 123=1 

Score = 120 (lfl.Q bits)-, Expect = l.fle-22-. Sum P(3) = l-Se-22 
Identities = 77/31b (24V.), Positives = lS7/31b (40X) 

50 fluery: 125S KGTYGRITIAUE ADGS 

ISDIFPTSGVILFTEG(2VLSTITLTILADNIPEL 1304 

+GTYG +++AU AG + ++ PT G + F+ G+ + L 

P 

Sbjct: S73 

55 RGTYGALSVAUTTGYAPGLEIPEFIVVGNHTPTLGSLSFSHGEflRKGVFLUTFPS — PGbl b30 
fluery: 1305 

SEVVIVTLTRITTEGVEDSYKGATIDflDRSKSVITTLPNDSPFGLVGURAASVFIRVAEP 13b4 
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E ++ L+ GV+ S G 6 RS ++ + P G+ + +S 

I V+E 

Sbjct: b31 PEAFVLHLS GV<2SSAPGGA~(2LRSGFIVAEI--- 

EPMGVFfiFSTSSRNIIVSE- b?"! 

5 

(2uery: 13b5 KENTTTL(3L(2IARDKGLLG]>IAIHLRA(2PNFLLHV]>Ni2ATENEl>YV- 
LflETIIIMKENIK 1423 

+T ++L + R G D+ I + £J A ED+ +(2 

+ ++ 

10 Sbjct: bBO — DTflMIRLHV(2RLFGFHSDL-IKVSYl2TTA 

GSAKPLEDFEPVfiNGELFFflKFiJT 731 

(Juery: 1M2M EAHAEVSILPDDLPELEEGFIVTITEVNLVN- 
SDFST6(2PSVRRPGf1EIAEIHIEENDDP 14fl2 
15 E E++I+ D L E+EE F + +T V + F +A I 

I +NDD 

Sbjct: ?32 

EVDFEITIINDflLSEIEEFFYINLTSVEIRGLflKFDVNUSPRLNLDFSVAVITILDNDDL 711 

20 (Juery: 14fl3 RGI-FMFHVTRGAGEVITAY 

EXXXXXXXXXXXXXXXAGSFGAVNVYUKASPDSAGLE 153fl 

G+FTAVT E V + 

+A+ SA E 
Sbjct: 712 

25 AGnPISFPETTVAVAVDTTLIPVETESTTYLSTSKTTTILflPTNVVAIVTEATGVSAIPE B51 

(Juery: 1531 DFKPSHGILEFADKiSVTAMIEITIIDDAEFEL 157D 

HG ++K A + + ' F L 
Sbjct: AS2 KLVTLHGTPAVSEKPDVATVTANVSIHGTFSL flfl3 



30 



Score = 113 (17-0 bits)-. Expect = 1-4e-34-. Sum P(3) = 1.4e-34 
Identities = 2fl/a? (32X)-. Positives = 50/57 (57X) 



fluery: 115b SHSDFIPSSGELIFEPGEREATIAVNILDDT — 
35 VPEKEESFKVflLKNPKGGAEIG-INDS 1212 

S DF+ + G L+FEPG+R + V + +T + + F++ L +PKGGA 

I + + 
Sbjct: 121b 

SGKDFVITEGTLVFEPGflRSTVLDVILTPETGSLNSFPKRFfllVLFDPKGGARIDKVYGT 1275 

40 

(Juery: 1213 VTITILSNDDAYGIVAFAcJNSLYKflVEE 1240 

IT++S+ D+ I A + L++ V + 
Sbjct: 127b ANITLVSDADS<2AIUGLA-l>c2LHi2PVND 13D2 

45 Score = 13 (14-0 bits)-, Expect = H-le-32-i Sum P(3) = 4-le-32 
Identities = 57/222 (25*)-. Positives = 10/222 (40*) 

(Juery: 1404 TENEDYVL— (JETIIIMKENIKEAHAE— VSILPDDLPEL 

EEGFIVTITEVN 1451 
50 TE+ Y+ + T 1+ N+ E VS +P+ L L E+ 

+ T+T 

Sbjct: filb 

TESTTYLSTSKTTTILflPTNVVAIVTEATGVSAIPEKLVTLHGTPAVSEKPDVATVTANV fl75 

55 (Juery: 1452 LVNSDFSTGfJPSVRRPGMEIAEIIHEENDDPRGIFUFHVTRGAGEV- 
ITAYEXXXXXXXX 1S1D 

++ FS G PS+ + I E 11 + + + G V IT 



-45- 



WO 01/98454 PCT/IB01/02050 
Sbjct: 67b SIHGTFSLG-PSI 

VYIEEEMKNGTFNTAEVLIRRTGGFTGNVSITVKTFGERCAflM 130 
fluery: 1511 

5 XXXXXXXAGSFGAVNVYIdKASPDSAGLEDFKPSHGILEFADKflVTAfllEITIIDDAEFEL 1S7D 

G +G N+ III EBF+ L F D + + + 

I+DD E E 

Sbjct: =131 EPNALPFRGIYGISNLTUAVEE 

EDFEEflTLTLIFLDGERERKVSVfllLDDDEPEG 165 

10 

<2uery: 1571 TETFNISLISVAGGGRL — GDD VVVTVVIPflNDSPFGVFGFEEKTVS 

IblS 

EF+L+ GG++ GD V+I +D 6+ GF E++ S 

Sbjct: 16b flEFFYVFLTNPflGGA(2IVEGKDDTGFAAFAf1VIITGSDLHNGIIGFSEES(2S 
15 1D37 

Score = 13 (14-0 bits)-i Expect = l.Qe-16, Sum P(3) = 1-Oe-lfl 
Identities = 51/236 (21*/.) Positives = 1Q7/H3fi (44>C) 

20 fluery: bOO VNIMILDDDIPEGDEKFflLILTNPSPGLELGKNT- 
IALIIVLANDDGPGVLSFNNSEHFF bS6 

++I + + +P+ + F+L +G + + AI+L +D+ ++ 

F+ 

Sbjct: 1101 

25 ISIELKPEKVP(3VEVYFFVELYEATAGAAINNSARFA(3IKILES]>ES(3SLVYFSVGSRLA llbfi 

fluery: b51 LREPTALYV(3ESVAVLYIVREPAiJGLFGTVTV(2FIVTEVNSSNE— 

SKDLTPS KGYI 713 

+ A + L + R+ GL ++V F E+ S+ 

30 ++P+ K ++ 

Sbjct: Hbl VAHKKATLIS LflVARDSGTGLII — 

MSVNFSTlJELRSAETIGRTIISPAISGKDFV 1221 

fluery: 714 VLEEGVRFKALfllSAILDT-^-EPE UDEY FVCTLFNPTGGARLG- 

35 VHVfiTLITVL 7b4 

+ E + F+ (2 S +LD PE ++ + F LF+P GGAR+ V+ 

IT+ + 

Sbjct: 1222 

ITEGTLVFEPGflRSTVLDVILTPETGSLNSFPKRFfllVLFDPKGGARIDKVYGTANITLV 1261 

40 

(Juery: 7b5 <2N<3APLGLFSISAVENRATSIDI- 
EEANRTVYLNVSRTNGIDLAVSVfllilETVSETAFGMR 823 

+ ++ ++ ++ + DI T+ + V+ T D +S + 

+ 

45 Sbjct: 1262 SDADSdAIUGLADflLHflPVNDDILNRVLHTISflKVA- 
TENTDEflLSAWIHLIEKIT — TE 133fl 

fiuery: A24 GMDVVFSV fi31 
G FSV 
50 Sbjct: 1331 GKIflAFSV 134b 

Score = 12 (13-8 bits)n Expect = l-Se-25-. Sum P(3) = 1.56-25 
Identities = 44/177 (24'/.) -. Positives = 62/177 (4b'/.) 

55 fluery : b8D 

PAflGLFGTVTVflFIVTEVNSSNESKDLTPSKGYIVLEEGVRFKALfllSAILDTEPEflDEY 731 
P +G++G + + V E + E + LT ++ +G R + + + + D 

EPE E+ 
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Sbjct: 13y PFRGIYGISNLTUAVEEEDF — EEflTLT 

LIFLDGERERKVSVfllLDDDEPEGtJEF 166 

<3uery: 7M0 FVCTLFNPTGGARL 

5 GVHVflTLITVLflNflAPLGLFSISAVENRATSIDIEEAN- 711 

F L NP GGA++ G ++ + + G+ S E + 

+++ E 

Sbjct: W FYVFLTNPflGGAOIVEGODTGFAAFAtlVIITGSDLHNGIIGFS— 
EESfiSGLELREGAV 10Mb 

10 

fluery: 712 -RTVYLNVSRT-NGIDLAVSVfiliJE-TVSETAF 

GURGHDVVFSVFtJSFLDESASGli) 6M3 

R ++L V+R N V V Id T+++T G+ 11+ + SV + 

Sbjct: 10M7 

15 l1RRLHLIVTRflPNRAFEDVKVFblRVTLNKTVVVLl2KDGVNLi1EELi3SVSGTTTCTHG(2TK 110b 

fluery: 8MM CFFTLE 6M1 
CF ++E 

Sbjct: 1107 CFISIE 111S 

20 

Score = 11 (13.7 bits)i Expect = b-be-32-, Sum P(3) = b-be-32 
Identities = Ml/153 (32*)-. Positives = 70/153 (MS*) 

Query: IMbb 

25 RPGMEIAEIMIEENDDPRGIFriFHVTRGAGEVITAYEXXXXXXXXXXXXXXXAGSFGAVN 1SBS 

R G +AEI +P G+F F + + +1 + + 

+ + 

Sbjct: b52 RSGFIVAEI EPMGVFflFSTS-- 

SRNIIVSEDTtjrilRLHVflRLFGFHSD LIK 700 

30 

Query: iSEb VYUKASPDSAG-LEDFKP- 
SHGILEFADKflVTAIIIEITIIDDAEFELTETFNISLISVAG 1563 

V ++ + SA LEDF+P +G L F U EITII+D E+ E F 

I+L SV 
35 Sbjct: 701 

VSY(JTTAGSAKPLEl>FEPV(2NGELFFfllCF(2TEV]>FEITIINI>(JLSEIEEFFYINLTSVEI 7b0 

fluery: ISAM GG RLGDDVVVTVV-IPtJNDSPFGV-FGFEEKTVS IblS 

G RL D V V+ I ND G+ F E TV+ 

40 Sbjct: 7bl RGLCJKFDVNIilSPRLNLDFSVAVITILBNDDLAGMDISFPETTVA 60M 

Score = b5 (1-8 bits)-! Expect = fl.fle-ET-i Sum P(3) = fl.fie-HT 
Identities = 2b/Tl (2b*/:)-, Positives = 50/T=J (SO'/.) 

45 fluery: 1232 NSLYKflVEEMElJDSLVTLNVERLKGTYGRITIAlilEADGS ISDIF-- 

PTSGVILFTE 1265 

NS K+ + + D ++++ GT IT+ +AD ++D P 

+ IL 

Sbjct: 1250 NSFPKRFfllVLFDPKGGARIDKVYGT- 
50 ANITLVSDADSflAIUGLADtaLHflPVNDDIL— 1305 

(Suery: 126b GflVLSTITLTILADNIPELSEVVIVTLTRITTEGVEDSYKGAT 1326 

+VL TI++ + +N E ++ + +ITTEG ++ A+ 
Sbjct: 130b NRVLHTISMKVATENTDEfiLSAHMHLIEKITTEGKIfiAFSVAS 13M6 

55 

Score = M8 (7-2 bits)i Expect = l-le-2?-. Sum P(3) = l.le-27 
Identities = 23/115 (20'/!)-, Positives = MM/115 (38X) 
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fiuery: 11 «H TAYEXXXXXXXXXXXXXXXAGSFGAVNVYLIKAS 

PDSAGLEDFKPSHGILEFAD 1551 

TA++ G++GA++V Id P+ + + P+ 

G L F + 
5 Sbjct: 55M 

TAFflLMNITAGTSHVNISRRGTYGALSVAWTTGYAPGLEIPEFIVVGNIITPTLGSLSFSH tl3 

fiuery: 1552 KflVTAIIIEITIIMAEFELTETFNISLI-- 
SVAGGGRLGDDVVVTVVIPflNDSPFGVFGF lfc.01 
10 + + + + ++S + S GG +L +V + 

P GVF F 

Sbjct: blM GEflRKGVFLUTFPSPGIi)PEAFVLHLSGV(2SSAPGGA<2LRSGFIVAEI 

EPNGVFCJF bbfl 



15 



Pedant information for DKFZphamy2_10p7i frame 3 



20 



Report for DKFZphamy2_10p7-3 



25 



30 



35 



40 



CLENGTH3 lb!5 
EMU3 177b00.5fl 
EpIJ H-37 

EH0M0L3 TREnBL:AF0550fiM_l gene: "VLGRl"i product: "very 

large G-protein coupled receptor-l"i Homo sapiens very large G- 
protein coupled receptor-1 (VLGR1) mRNAi complete cds. 5e-2M 
CBL0CKS3 BPQmT3A 

IBL0CKS3 BL00713B Sodium : dicarboxylate symporter family proteins 

EBLOCKSJ PR01D03A 

EBLOCKSH PR00M12C 

EBLOCKSJ BL00S2ME 
IPIRKliD heart le-Ofl 

CPIRKU1 ion transport le-Ofl 

IPIRKIO transmembrane protein 3e-0fi 

IPIRKIO phosphoprotein 2e-0fl 

IPIRKIO membrane protein le-Ofl 

CPR0SITE3 MULTIC0PPER_0XIDASE1 . 1 

CKIO All_Beta 

EKtO L0W_C0I1PLEXITY 2-bO X 



SE<3 DAUADAldALYTCATLCLKEflACSAFSFFSASEGPlJCFUnTSIOSPAVNNSDFWTYRKNriT 

45 SEG xxxxxxxxxxx 

PRD ccchhhhhhhhchhhhhhhhhhheeeeeecccccceeeeeeeccccccccceeeecccee 

SEt2 RVASLFSGflAVAGSDYEPVTRflUAinflEGDEFANLTVSILPDDFPEnDESFLISLLEVHL 

SEG 

50 PRB eeeeeccccccccccceeeceeeeeeccccceeeeeeeeccccccchhhhhhhhhhhhhh 

SEA PINISASLKN(JPTIG<JPNISTVVIALNGDAFGVFVIYSISPNTSE]>GLFVEVl2EflP(3TLVE 

SEG 

PRD hccccccccccccccccceeeeeeecccceeeeeeeeecccccccceeeeeeecccceee 



55 



SEfl LMIHRTGGSLGflVAVEWRVVGGTATEGLDFIGAGEILTFAEGETKKTVILTILDDSEPED 

SEG xxxxxxxxx 

PRD eeeeecccccceeeeeeecccccccccccccccceeeeeccccceeeeeeeeeccccccc 
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SE<J DESIIVSLVYTEGGSRILPSSDTVRVNILANDNVAGIVSFinASRSVIGHEGEILflFHVI 

SEG xxxxxxx 

PRD ccceeeeeeeccccccccccccceeeeeeccccceeeeeeeccceeeeccccceeeeeee 

5 

SE(3 RTFPGRGNVTVNQIKIIGflNLELNFANFSGflLFFPEGSLNTTLFVHLLDDNIPEEKEVYflV 

SEG 

PRD eccccccceeeeeeeecccccccccccccceeecccceeeeeeeeeecccccccccceee 

10 SEl3 ILYDVRT<2GVPPAGIALLDAflGYAAVLTVEASDEPHGVLNFALSSRFVLL<2EANITIfll_F 

SEG 

PRD eeccceeeeccchhhhhhhhccccceeeeeecccccceeeeeeceeeeeecccccceeee 

SEfi INREFGSLGAINVTYTTVPGHLSLKNfiTVGNLAEPEVDFVPIIGFLILEEGETAAAINIT 

15 SEG 

PRD cccccccceeeeeeecccccccccccccccccccccceeeeeeeeeeeccccccccceee 

SE<3 ILEDDVPELEEYFLVNLTYVGLTMAASTSFPPRLDSEGLTAt3VIIDANDGARGVIEli)fl<3S 

SEG 

20 PRD eccccchhhhhheeeeeeeecceeecccccccccccccceeeeeeeccccceeeeeeccc 

SE(3 RFEVNETHGSLTLVA<2RSREPLGHVSLFVYA<2NLEA(2VGLDYIFTPMILHFADGERYKNV 

SEG 

PRD eeeecccccceeeeeeccccccceeeeeeeeccccccccccccccceeeecccccceeee 

25 

SEfi NIMILDDDIPEGDEKFflLILTNPSPGLELGKNTIALIIVLANDDGPGVLSFNNSEHFFLR 

SEG 

PRD eeeeeccccccccceeeeeeeccccccccccceeeGeeeecccccceeeeeeccceeeee 

30 SE<2 EPTALYVflESVAVLYIVREPAflGLFGTVTVflFIVTEVNSSNESKDLTPSKGYIVLEEGVR 

SEG 

PRD ccceeeeccchhhhhhhhhcccccceeeeeeeeeeeccccccccccccccceeeeeccce 

SE<2 FKAL(JISAILDTEPE(1DEYFVCTLFNPTG6ARLGVHV(3TLITVL(2N<2APLGLFSISAVEN 

35 SEG 

PRD eeeeeeeeecccchhhhhhheeeeecccccceeehhhhhhhhhhhhhcccceeeeeecch 

SE(3 RATSIDIEEANRTVYLNVSRTNGIDLAVSVdUETVSETAFGflRGMDVVFSVFlSSFLDESA 

SEG 

40 PRD hhhhhccccccceeeeeeeccccchhhhheeeeeccceeeeccccceeeeeeeecccccc 

SEU SGUCFFTLENLIYGIIILRKSSVTVYRUtJGIFIPVEDLNIENPKTCEAFNIGFSPYFVITH 

SEG 

PRD cceeeeeccccccceeecccceeeecccceeeccceeeeccccccceeecccccceeeee 

45 

SEA EERNEEKPSLNSVFTFTSGFKLFLV(3TIIILESSflVRYFTSDS(3DYLIIASJjRDDSELT(2 

SEG 

PRD hhhhhcccceeeeeeecccceeeeeceeecccccceeeeccccceeeeeeecccccceee 

50 SE<2 VFRUNGGSFVLHflKLPVRGVLTVALFNKGGSVFLAISflANARLNSLLFRUSGSGFINFdE 

SEG 

PRD eeeeccceeeeeeccccceeeeeeeeccccceeeeeeehhhhhheeeeeecccccceeee 

SEfi VPVSGTTEVEALSSANDIYLIFAKNVFLGDdNSIDIFIWEHGrJSSFRYFflSVDFAAVNRI 

55 SEG 

PRD eeccccceeeeccccceeeeeeeeeeeecccceeeeeeeeccccceeeeeeccceeeece 

SE(3 HSFTPASGIAHILLIG<2DMSALYCUNSERNtfFSFVLEVPSAYDVASVTVKSLNSSKNLIA 
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SEG 

PRD eecccccceeeeeeeccccceeeeecccccceeeeeeeccccceeeeeeeccccccceee 

SE(3 LVGAHSHIYELAYISSHSDFIPSSGELIFEPGEREATIAVNILDDTVPEKEESFKV<2LKN 

5 SEG 

PRD eeccceeeeeeeeeeccccccccceeeeecccchhhhheeeeeccccccccceeeeeeec 

SEC PKGGAEIGINDSVTITILSNDDAYGIVAFAfiNSLYKflVEEnEflDSLVTLNVERLICGTYGR 

SEG 

10 PRD ccccceeecccceeeeeecccccchhhhhhccchhhhhhhhhhhhhhhhhhhccccceee 

SEC ITIAUEADGSISDIFPTSGVILFTEGfivLSTITLTILADNIPELSEVVIVTLTRITTEGV 

SEG 

PRD eeeeeeeccceeeeeccccceeeeccccccceeeeeecccceeeeeeeeeeeeeeceeee 

15 

SEC EDSYKGATID(JDRSKSVITTLPNDSPFGLVGWRAASVFIRVAEPKENTTTL<JL<3IARDKG 

SEG 

PRD cceeeeeeeecccceeeeeecccccccceeehhhhhheeeeeccccccccceeeeccccc 

20 SEA LLGDIAIHLRA(3PNFLLHVDN(2ATENEDYVL(3ETIIiriKENIKEAHAEVSILPDDLPELE 

SEG 

PRD ccccccceeecccceeeeeccccccccceeeeeeeeeecccchhhhheeeeccccccccc 

SEfl EGFIVTITEVNLVNSDFSTGdJPSVRRPGMEIAEIIIIEENDDPRGIFIIFHVTRGAGEVITA 

25 SEG 

PRD cceeeeeeeeeeccccccccccccccccchhhhhhhhcccccceeeeeeeeccccceeee 

SEfl YEVPPPLNVLflVPVVRLAGSFGAVNVYWKASPDSAGLEDFKPSHGILEFADKfiVTAFIIEI 

SEG • -xxxxxxxxxxxxxxx 

30 PRD eeccccceeeeeeeeeecccccceeeeeeccccccccccccccceeeeecccceeeecce 

SE(3 TIIDDAEFELTETFNISLISVAGGGRLGDDVVVTVVIPdNDSPFGVFGFEEKTVS 

SEG 

PRD eeechhhhhhhhcceeeeeeecccccccceeeeeeeecccccccceeeecccccc 

35 

Prosite for DKFZphamyB_10p7- 3 

40 PSD0D71 151->17E nULTIC0PPER_OXIDASEl PDOC0007fe> 

(No Pfam data available for DKFZphamyE_10p7-3) 
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5 group: transmembrane protein 

DKFZphamyE_lld2 encodes a novel SS2 amino acid protein without 
similarity to known proteins- 

10 The novel protein contains 2 transmembrane regions. 

No informative Blast resultsi no predictive prositei pfam or 
scope motife- 



The new protein can find application in studying the expression 
15 profile of amygdala-specific genes and as a new marker for 
amygdala cells- 



20 



unknown protein 

Pedant: TRANSMEMBRANE 5 



Sequenced by EMBL 
25 Locus: /map="lfcipl3 - 3" 



Insert length: a 
Poly A stretch a 

30 

1 GGCGGGTGAG 
51 GATTCCCCGT 
101 GAGGACGTCC 
151 GGGATGTGCC 
35 HOI CCCTTGAAAA 
551 AAAAAATGAG 
301 GGTGCCGAGC 
351 GTGTTCGTCG 
401 GCGAATGTGG 
40 451 CTGTGGATGA 
501 AAAAACACCA 
551 CTTTTCCTCT 
feQl 6CACGCTCTG 
bSl GCTGTGCCCC 
45 701 GGGCAGACCC 
751 TGTGGAACCA 
601 CTGCTGCAGG 
fl51 TCTCACCCAG 
'lOl CCGG6CACCA 
50 151 GGCTTCCTCC 
1001 CTGCGCAAGC 
1051 TGACCGGGAG 
1101 CTCAATGCCA 
1151 CTACCTGATG 
55 1E01 GCTCAGAGGC 
1ES1 ACACCCAAGG 
1301 ACCAGACACC 
1351 AGATCCTGTT 



1 bp 

pos- ETEDi polyaden 



AGGCCGCGGC GGCAGGTCCA 
CCACAGCTCA CGACCAGATG 
TCCGGGCACT CCCACGACCA 
CGTGATGTTG GACCACAAGG 
ATGAAGAAAG AAAATCGCAG 
GATAACGTGA AAAGCGCGCC 
GGCGGCGTTT TTTCTTTCAT 
TCTCATTCGT CATCCCGTGT 
AGGATAGACT ACAGTGCCGC 
TATAAACGGG GACAGGATCC 
ACAGCAGCAA CAATTTCAGC 
CCCTGCACCT TTGCAGCTGC 
GGAGAGACCT GTGGCCCAAG 
AGCCAAGAGG CAGTGAGGCA 
AGTTCTTTCA TTGCAGTCAA 
CAGCAGCAGC TTCAGCGGGA 
TGCCTGATGT GGACGGCGAT 
GAGCGGGAGG AGGTTAGTGG 
GATTGGCCTC AGAGGCAGCC 
TTCACGTCAC CAGGACAGGT 
TCCCTCTGCG GCTGCTCTGT 
CGGCGGCCCG TTCAAGAGTG 
CCACCCGCAG GATGCTTTCC 
CATGTCCCAG GGAACGCCGG 
CTTCGTGCTG CTGGACGGGC 
CAGCCCATGT CCTGAGAAAA 
TTGGCTGTAG CCGTTGAAAA 
TCTGGACCTT G6CACTGGAG 



ation signal at pos- EflbT 



CCTGGGCTTG CGAAGGCACA 
CACCAGCAGG AGTCCACATC 
GTGACCAGGA GTTAAACTTT 
ACTTAGAGGC CGAAATCCAC 
GAAAATCTGG GAAATCCATC 
TCCACAGTCC CGGCTCTCCC 
TGTTTCTCTG CCTTTTTGTG 
CCAGACCGGC CGGCGTCACA 
TGTTATCTAT GACTTTCTGG 
AAGATGTTCT TTTTCTTTAT 
CGATCCTGTG TGGACGAAGG 
TGTGTCGGGG GCCAACGGCA 
ACGT6GCCCT C6TGGAGT6T 
CCTTCTGCCT GCATCCTGGT 
CTTGTTCACA GGGGAAACCC 
ATGCGTCCAT CCTGAGCCCT 
GGGGCCCCAG ACCTGCTGGT 
CCACCTCTAC TCCGGCAGCA 
TT6GTGTGGA CGGGGAAAGT 
GCCCACTACA TCCTCTTTCC 
GAAGGGTCTC TACGAGAAGG 
ACCCGCACTG GGAGAGCATG 
CACAGCTCTG GAGCAGTGCG 
TGCAGATGTG CTTCTTGTGG 
AGGAGCTGAC GCCTCGCTGG 
CCCATCTTCG GCCGCTACAA 
CGGAACTGGC ACCGACAGAC 
CCGTCCT6TG TAGCCTAGCC 
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1M01 CTCCCGAGCC TCCCTG6GGG TCCACTGTCC GCCAGCCTGC CGACCGCAGA 

1MS1 CCACCGCTCA GCCTTCTTCT TCTGGGGCCT CCACGAGCTG GGGAGCACCA 

1501 GCGAGACGGA GACCG6GGAG GCCCGGCACA GCCTGTACAT GTTCCACCCC 

1S51 ACCCTGCCGC GCGTGCTGCT GGA6CTGGCC AATGTCTCTA CCCACATTGT 

5 IbOl CGCCTTTGAC GCCGTCCTGT TTGAGCCAAG CCGCCACGCC GCCTACATCC 

lt.51 TTCTGACAGG CCCGGCAGAC TCAGAGGCAC CCGGCCTGGT CTCTGTGATC 

17D1 AAGCACAAGG TGCGGGACCT TGTCCCAAGC AGCAGGGTGG TCCGCCTGGG 

1751 TGAGGGTGGG CCAGACAGTG ACCAAGCCAT CAGGGACCGG TTCTCCCGGC 

lflOl TGCGGTACCA GAGTGAGGCG TAGAGGCACG CCAGCCAGAG CCTGTGGAGA 

10 1651 GACTCCGCCT GCTGACACTA AACGTCCTGG GAAGTGGGCC CTTCCCTGGG 

1=101 TCTCTGCACT GACTCCCCCA CTCCTGACCC TGGTGATGGT CGCCACTGGG 

1151 CAGCAGCAGC CTTACCAGTC CTCCATGATC ACACCCAGGG ACCTGCATGG 

2001 GTGAGGGGAC ACCCTGGGCC TCTCTCCCGC CCAGCATCCT CCCTGAGTCC 

2051 CCACACAGGG CCTCACTCTG CACCCCACCA GGGTCCCGCT CACACCAGGC 

15 2101 AGCCTTCATA GTGGTCTCCC TGGCCACCTT GGGCAGAGCT GGGTCATGCA 

2151 GCACCCCATC CTTACCCGGT GCCCTCTCCT TGCCAGCTTC TCCCCAGGCC 

22D1 AGAGCGGCCA TCGCGTAGAA AGAACCAGGG TGTCCCCGGG ACAGGCCGTC 

2251 CCCCACCCCA TCCTGTAGAG TCCATTCCCC TTTTCCCTCC TGTGCTCTGT 

2301 CCCCCAAGGA GTCATGGAAC TCAGGGTACT GGGCCTCAAC GGGAACCTGA 

20 2351 GACAGCTTCC AGCTTCGCAG CCCTTCCCGG AGCTACAGGG GGATCCTCTA 

2M01 GCAT6GGGGG TGTGACTTGG T.TCCTTTGAC CAGGTCCTGT GAGGAAGCCT 

2451 GGAGCAAGGG TCTCCCCCAG CAGGATGGGT GGGGCCTGCT CTGGAGCTGA 

2501 GCCCGTGGCC GCTCACAGGT GTCCTTAGTG GTGTTGCAGC TGTCTACTGG 

2551 CTGCATGTGC TGTGAATATC CCAAGGAACT GGCTGTGGAA TGCGTGTTTG 

25 2bDl GGTCAGTCTG TGCCCTCTCA GTAGACACTG GAGCTGCTCT GTCCCTGAAG 

2bSl AGGCCCCGTG CCCCAGGCAT GGCAAGCGCC TGCCTCTCCC CTTCCGGTGC 

2701 TCACACGCCC ACGCCGTGCC ACCCGATGCA GGACTCACCT CTGTGCCTTG 

2751 CTGCTCCTGA GGCCCAAGGG CAGCCATGGT GCTCTGTACT GCTCGGGCCG 

2fiDl CCCAGGTCAC AGAGCCTGAG CTTCGTAGCC AAAGCAGCCT GATGACCCAC 

30 2651 CCACCAAGGA AGAAAGCAGA ATAAACATTT TTGCACTGCC TGAAAAACCC 

2101 CGGTGGTCAG GCGTGAGCCT AAAAAAAAAA AAAAAAAAA 



BLAST Results 

35 

No BLAST result 



40 Medline entries 



No Medline entry 

45 

Peptide information for frame 2 



50 ORF from 2555 bp to 2631 bpi peptide length: 15 
Category: questionable ORF 
Classification: unclassified 

1 MCCEYPKELA VECVFGSVCA LSVDTGAALS LKRPRAPGMA SACLSPSGAH 
55 51 TPTPCHPMUD SPLCLAAPEA flGflPWCSVLL GPPRStJSLSF VAKAA 
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BLASTP hits 

No BLASTP hits available 

5 Alert BLASTP hits for DKFZphamy2_lld2i frame 2 

TREMBL:miIGCF_2 Mouse ig gamma2a-b(c57bl/b allele) c gene and 
secreted 

tail-i N = It Score = 73-. P = 0-1 

10 

>TREMBL:I1HIGCF_5 Mouse ig gamma2a-b (c57bl/b allele) c gene and 
secreted 

tail • 

15 Length = 33H 

HSPs: 

Score = 73 (11.0 bits). Expect = Lle-Oli P = l-Oe-01 
20 Identities = lb/Hi (32"/.)-, Positives = 27/^H <5SX) 

fluery: ^^ LSPSGAHTPTPCHPMflDSPLCLAAPEAflGfiPUCSVLLGPPRSflSLSFVA ^2 

+ P T PC P+++ P C AAP+ G P SV + PP+ + + ++ 
Sbjct: 'lb IEPRVPITflNPCPPLKECPPC-AAPDLLGGP — SVFIFPPKIKDVLHIS 
25 mi 



Peptide information for frame 3 

30 

ORF from IbS bp to lfi2D bpi peptide length: SS2 
Category: putative protein 

Classification: Transmembrane proteins unclassified 

35 

1 MLDHKDLEAE IHPLKNEERK SflENLGNPSK NEDNVKSAPP (2SRLSRCRAA 

51 AFFLSLFLCL FVVFVVSFVI PCPDRPASfiR MURIDYSAAV IYDFLAVDDI 

1Q1 NGDRICDVLF LYKNTNSSNN FSRSCVDEGF SSPCTFAAAV SGANGSTLUE 

151 RPVAflDVALV ECAVPflPRGS EAPSACILVG RPSSFIAVNL FTGETLUNHS 

40 2 01 SSFSGNASIL SPLLflVPDVD GDGAPDLLVL TflEREEVSGH LYSGSTGHfll 

251 GLRGSLGVDG ESGFLLHVTR TGAHYILFPC ASSLCGCSVK GLYEKVTGSG 

3D1 GPFKSDPHWE SHLNATTRRM LSHSSGAVRY LflHVPGNAGA DVLLVGSEAF 

351 VLLDGflELTP RUTPKAAHVL RKPIFGRYKP DTLAVAVENG TGTDRtJILFL 

^01 DLGTGA VLCS LALPSLPGGP LSASLPTADH RSAFFFUGLH ELGSTSETET 

45 M51 GEARHSLYMF HPTLPRVLLE LANVSTHIVA FDAVLFEPSR HAAYILLTGP 

501 ADSEAPGLVS VIKHKVRDLV PSSRVVRLGE GGPDSDOAIR DRFSRLRYflS 
551 EA 



50 

BLASTP hits 

No BLASTP hits available 
55 Alert BLASTP hits for DKFZphamy2_lld2-. frame 3 

No Alert BLASTP hits found 
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Report for DKFZphamy2_lld5 • 2 

5 

ILENGTHH 

CMUI ^757-33 
CpU b-bfl 
10 IBL0CKS1 PR00521E 
£KUJ Alpha_Beta 

SE<2 MCCEYPKELAVECVFGSVCALSVDTGAALSLKRPRAPGMASACLSPSGAHTPTPCHPMiJD 
15 PRD cccchhhhhhhhhccceeeeeecccchhhhhccccccccccccccccccccccccccccc 

SEC SPLCLAAPEAfiGfiPWCSVLLGPPRSflSLSFVAKAA 
PRD ccccccccccccccceeeeccccccchhhhhhccc 

20 

(No Prosite data available for DKFZphamy2_lld2.2) 
(No Pfam data available for DKFZphamy2_lld2.2) 

25 

Pedant information for DKFZphamy2_lld2 t frame 3 
Report for DKFZphamy2_lld2.3 

30 

CLENGTH3 SS2 

EpU 5«AM 

35 CBLOCKSJ PR00211G 

CBLOCKSD BL0D2fifiC Tissue inhibitors of metalloproteinases 
proteins 

EBL0CKS3 PR00M3bA 

CKliD TRANSMEMBRANE 2 

40 CKliD L0W_C0MPLEXITY 6-15 V. 



SE(3 MLDHOLEAEIHPLKNEERKSflENLGNPSKNEDNVKSAPPflSRLSRCRAAAFFLSLFLCL 

SEG xxxxxxxxx 

45 PRD ccchhhhhhhcccccccccccccccccccccccccccccccchhhhhhhhhhhhhhhhhh 

MEM MMMMMMM 

SE<2 FVVFVVSFVIPCPDRPASflRMURIDYSAAVIYDFLAVDDINGDRIflDVLFLYKNTNSSNN 

SEG xxxxxxxxx 

50 PRD hhhhhhhccccccccccchhhhhhhchhhhhhhhhccccccccchhhhhhhccccccccc 

MEM MMMMMMMMMM 

SEfl FSRSCVDEGFSSPCTFAAAVSGANGSTLUERPVAfiDVALVECAVPflPRGSEAPSACILVG 

SEG 

55 PRD ccccccccccccccccccccccccccccccccccchhhhhhhccccccccccceeeeeec 

MEM 

SE(2 RPSSFIAVNLFTGETLUNHSSSFSGNASILSPLLflVPDVDGDGAPDLLVLT<3EREEVSGH 
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SEG 

PRD cccceeeeeccccccccccccccccceeeecceeecccccccccccchhhhhhhhhhhcc 

MEM 

5 SE(3 LYSGSTGHtJIGLRGSLGVDGESGFLLHVTRTGAHYILFPCASSLCGCSVKGLYEKv'TGSG 

SEG 

PRD cccccccccccccccccccccceeeeeeecccceeeeeccccccccccceeeeccccccc 

HEM 

10 SEd GPFKSDPHWESMLNATTRRNLSHSSGAVRYLNHVPGNAGADVLLVGSEAFVLLDGiaELTP 

SEG 

PRD ccccccccccccchhhhhhhhhcccccceeeccccccccceeeeeccceeeeeccccccc 

MEM 

15 SE<3 RWTPKAAHVLRKPIFGRYKPDTLAVAVENGTGTDRfllLFLDLGTGAVLCSLALPSLPGGP 

SEG xxxxxxxxxxx 

PRD ccchhhhhhcccccccccccceeeeeeccccccceeeeeeeccccceeeeeeeccccccc 

mem ririririnririririnririiinMnn 

20 SE(2 LSASLPTADHRSAFFFUGLHELGSTSETETGEARHSLYI1FHPTLPRVLLELANVSTHIVA 

SEG xxxxxx xxxxxxxxxx 

PRD ccccccccccccceeeccccccccccccccccccccceeeccccccccccccccceeeee 

heii 

25 se<2 fdavlfepsrhaayilltgpadseapglvsvikhkvrdlvpssrvvrlgeggpdsdfiair 

SEG 

PRD eeeeeeccccceeeeeecccccccccceeeeeecccccccccceeeeecccccccchhhh 

HEM 

30 SEC! DRFSRLRYGSEA 

SEG 

PRD hhhhhhhhhccc 

F1EM 

35 

(No Prosite data available for DKFZphamy2_lldS.3) 

(No Pfam data available for DKFZphamy2_lldS.3) 



-55- 



WO 01/98454 
DKFZphamy2_llni» 



PCT/IB01/02050 



5 group: nucleic acid management 

DKFZphamy2_llnM encodes a novel ICHl amino acid protein with 
similarity to RADlfi of Schizosaccharomyces pombe and YLR3fl3w of 
Saccharomyces cerevisia- 

10 

The novel protein contains a ATP/GTP-binding site motif A CP- 
loop)- It has similarity to RADlfi acts in a DNA repair pathway 
for removal of UV-induced DNA damage. YLR3B3w of Saccharomyces 
cerevisiae is a recombination repair protein. 

15 

The new protein can find application in modulation of ]>NA-repair 
and a as a new tool for manipulation of nucleic acids. 

similarity to RADlfi (Schizosaccharomyces pombe) 

20 

comment on PS3b15: 

FUNCTION: ACTS IN A DNA REPAIR PATHWAY FOR REMOVAL OF UV-INDUCED 
DNA DAMAGE THAT IS DISTINCT FROM CLASSICAL NUCLEOTIDE EXCISION 
REPAIR AND IN REPAIR OF IONIZING RADIATION DAMAGE- 

25 

Sequenced by EMBL 

Locus: /map="2 n 

30 Insert length: 3^,7=1 bp 

Poly A stretch at pos- 3bMt.n polyadeny lation signal at pos- 3b20 



1 ACCGCGGTG6 GCGCCGGGGC TCCCGGGAAT CTACCTTCTC CTGCGGCCGG 

35 51 CACGCGGTTC CCAGGGGGCC AGCGGCGGTC AGCCGAGGTC GAGACGCCCG 

101 CAGGGTGGCC TTAGCGGCCG GTCGTACCAC GGCAGCCCCG CCGATCAGGT 

151 TCCTTTGGGA GACTTC6ACT TGTTGGCGAA ATGAACCGGA GAAGAATCCC 

501 AATTGGGAAT TGCGGAAAAC AGGACTCTA6 GGTAGAGAAA GGTTGTAGAA 

251 CCAATAGGGT TTGAGACCTG ATGGCCAAAA GAAAGGAAGA AAATTTTTCC 

40 301 TCTCCTAAAA ATGCCAAAAG GCCAAGACAA GAAGAATTGG AGGATTTTGA 

351 TAAAGATGGT GACGAAGACG AATGTAAAGG TACTACTTTG ACTGCAGCAG 

M01 AAGTTGGAAT AATTGAGAGT ATTCACCTAA AAAACTTCAT GTGTCATTCA 

MSI ATGCTTGGAC CTTTTAAGTT TGGTTCTAAT GTCAACTTTG TTGTTGGCAA 

501 CAATGGAAGT GGGAAGAGTG CAGTACTCAC AGCTCTCATA GTCGGTCTTG 

45 551 GTG6AAGAGC AGTTGCTACT AATAGAG6AT CCTCTTTAAA AGGTTTTGTG 

bOl AAAGATGGAC AGAACTCTGC AGATATCTCA ATAACATTGA GGAACAGAGG 

fe.51 AGATGATGCC TTTAAAGCCA GTGTGTATGG TAACTCTATA CTTATACAGC 

701 AACACATCAG CATAGATGGA AGTCGATCTT ATAAACTTAA AAGTGCAACA 

751 GGCTCCGTGG TTTCCACGAG GAAAGAAGAG CTGATTGCAA TTCTTGATCA 

50 flOl TTTTAACATC CAGGTG6ATA ATCCAGTTTC TGTTTTAACA CAAGAAATGA 

651 GCAAGCAGTT CTTACAGTCT AAAAATGAAG GAGACAAATA CAAATTCTTC 

■101 ATGAAAGCAA CGCAACTTGA ACAGATGAAG GAAGATTATT CATACATTAT 

151 GGAAACGAAA GAAAGAACAA AGGAGCAGAT ACATCAAGGA GAAGAGCGGC 

10D1 TTACTGAACT AAAGCGCCAG TGTGTAGAGA AAGAGGAACG TTTTCAAAGT 

55 1051 ATTGCTGGTT TAAGTACAAT GAAGACTAAT TTAGAGTCCT TGAAACATGA 

1101 AATGGCTTGG GCAGTG6TCA ATGAAATTGA AAAACAATTG AATGCCATCA 

1151 GAGATAATAT CAAAATTGGA GAAGATCGTG CTGCTAGACT TGACAGGAAA 

1201 ATGGAAGAAC AGCAGGTCAG ACTTAATGAG GCAGAACAAA AGTACAAGGA 
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1E51 TATTCAAGAC AAACTAGAAA 

13D1 CAGAATGTAT GGCATTGAAA 

1351 AATGAAGCTG AGGTTTTATA 

md AAAGAAAGAT GATGAGCAGC 

5 1451 GTACTGACCA ATCTTTGGAA 

1SD1 TCTTGGTTAA AAGAGAGAGT 

1551 CAATCAAGAG ATCGAACAGT 

IbOl AACATGGCAA AATTAAGAGA 

lb51 TACAATCAGA GGCAACTGAA 

10 17D1 CAAAAGATTT GGCCCTAATG 

1751 CTTATAGACA AGGACATTTT 

IflDl TGCATTCATC TTCGGGACCC 

1A51 AAAA6GGCTT CTGCAG6CCT 

1101 TCCTTCAGGC ACTCATGAAA 

15 1151 CCGATAATAG TTTCTGAGTT 

S001 AGCTGCTTAT CATCCAGACT 

SD51 ATAATGCGGT TGTGGCAAAT 

2101 GT6CTACTAA TCAAAAATAA 

2151 AAAGCCACCC AAAAATTGTA 

20 2201 TTTTTGCAGG ACGTTATTAT 

2251 AGCAGAGATG TGGATTCTGA 

2301 TAAGACGGCC CAGATATTAA 

2351 AAGATATTAA ACACAATGAG 

2>401 AAAGAACTAA AGATGAAAAT 

25 2M51 TGAGAACATA GAAGAACACC 

2501 AAGCTCAGGA AAATAAAAGC 

2551 CAACAAAAAG AAAATATGGA 

2b01 AAATAAGTAT GATGCAATTA 

2b51 CAGACCCACT TAAGGATGAA 

30 2701 CAAAAACGAG GGAAACGACA 

2751 TACCTTAAAT AAAAAGAAAC 

2601 AGGAGAAAAT GTCACAAGCA 

2851 GAAAAATCTG CATCAATTCT 

2101 GATACAGGCA GAACATGCTA 

35 2151 AGTACCAAGA AGCAAGAGAG 

3001 ACTTTAAAAA AGTTTATTAA 

3051 CAAGACATAT CAACAATTTA 

3101 ACTTTGACAA CTTACTATCT 

3151 GACCACAAGA AT6AAACTCT 

40 3201 TAAAGCTGCT TTCAATGACA 

3251 TCTCCACAGT GTGTTTTATT 

3301 TTCAGATGCC TGGATGAATT 

3351 AATTGCCATG GACTTGATAC 

3101 AGTTTATCTT GCTCACACCT 

45 31451 CTGATAAGAA TTCTCCGAAT 

3SD1 GCCTTTCAGA CCTGTGACTC 

3551 TAACTTAACA TGCCTTGTCC 

3b01 AATTCTGGAC TCTTTGATAT 

3b51 AAAAAAAAAA AAAAAAAAAA 

50 



AGATTAGTGA AGAGACAAAT GCACGAGCAC 
GCAGATGTTG TTGCTAAGAA AAGGGCCTAT 
TAACCGATCC TTAAACGAAT ATAAAGCATT 
TTTGTAAACG AATTGAAGAG CTGAAAAAAA 
CCTGAACGGT TGGAAAGACA AAAAAAAATA 
AAAGGCCTTT CAAAATCAAG AAAATTCAGT 
TTCAGCAAGC CATAGAAAAG GACAAAGAAG 
GAAGAATTAG ATGTGAAGCA TGCACTGAGC 
AGAATTGAAA GATAGTAAAA CTGATCGACT 
TTCCAGCTCT TCTT6AAGCC ATAGATGATG 
ACCTATAAAC CTGTAGGCCC TTTAGGAGCT 
AGAACTTGCT TT6GCTATTG AATCTTGCTT 
ATTGTTGCCA TAATCATGCT GATGAAAGGG 
AGGTTTTATT TACCAGGGAC CTCACGGCCA 
TCGGAATGAG ATATATGATG TAAGACACAG 
TTCCAACAGT TCTGACAGCT TTAGAAATAG 
AGCCTAATTG ACATGAGAGG CATAGAGACA 
TTCTGTAGCT CGTGCAGTAA TGCAGTCCCA 
GAGAAGCTTT TACTGCTGAT GGTGATCAAG 
TCATCTGAAA ATACAAGACC TAAGTTCCTA 
AATAAGTGAC TTGGAGAATG AGGTTGAAAA 
ATCTTCAGCA ACATTTATCT GCCCTTGAAA 
GAACTTCTTA AAAGGTGCCA ACTACATTAT 
AAGAAAAAAT ATTTCTGAAA TTCGGGAACT 
AGTCTGTAGA TATTGCAACT TTGGAAGATG 
AAAATGAAAA T6GTTGAGGA ACATATGGAG 
GCATCTTAAA AGTCTGAAAA TAGAAGCAGA 
AATTCAAAAT TAATCAACTA TCGGAGCTAG 
TTAAACCTTG CTGATTCTGA AGTGGATAAC 
TTATGAAGAA AAACAAAAAG AACACTTGGA 
GAGAACTGGA TATGAAAGAG AAA6AACTAG 
AGACAAATCT GCCCAGAGCG TATAGAAGTA 
GGACAAAGAA ATTAATCGAT TAAGGCAGAA 
GTCATGGAGA TCGAGAGGAA ATAATGAGGC 
ACCTATCTTG ATCTGGATAG TAAAGTGAGG 
ATTACTGGGA GAAATCATGG AGCACAGATT 
GAAGGT6TTT GACTTTACGA TGCAAATTAT 
CAGCGGGCCT ATTGTGGAAA AATGAATTTT 
AAGTATATCA GTTCA6CCTG GAGAAG6AAA 
TGAGAGCCTT GTCTGGAGGT GAACGTTCTT 
CTTTCCCTGT GGTCCATCGC AGAATCTCCT 
TGATGTCTAC ATGGATATGG TTAATAGGAG 
TGAAGATGGC AGATTCCCAG CGTTTTAGAC 
CAAAGCATGA GTTCACTTCC ATCCAGTAAA 
GTCTGATCCT GAAAGA6GAC AAACTACATT 
AAGAAGAAGA TGATGACCAA AGGTGATTTG 
T6ATGTTGAA GGATTTGTGA AGGGAAAAAA 
AATAAAATGA GACTGGAGGC ATTCTGAAAA 
AAAAAAAAA 



BLAST Results 



55 No BLAST result 



Medline entries 
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Lehmann ARi lilalicka ft-, Griffiths DJi Murray JMi Uatts FZn 
5 flcCready Si 

Carr AM.i The radlfi gene of Schizosaccharomyces porabe defines a 
new subgroup of the SMC superfamily involved in DNA 
repair- Ilol Cell Biol 1115 Decil5(12> :70b7-fl0 

10 =113601137: 

llengiste Ti Revenkova Ei Bechtold N-i Paszkowski J-i An SMC-like 
protein 

is required for efficient homologous 

recombination in Arabidopsis- EflBO J 1111 Aug IbUfl(lb) : 4505-12 



Peptide information for frame 1 
20 

ORF from 271 bp to 3543 bpi peptide length: 1011 
Category: similarity to known protein 
Classification: Nucleic acid management 
25 Prosite motifs: RGD (12b-12fl) 
ATP_GTP_A (7b-fl3) 



1 MAKRKEENFS 

30 51 IHLKNFMCHS 

101 NRGSSLKGFV 

151 SRSYKLKSAT 

201 KNEGDKYKFF 

251 CVEKEERF<2S 

35 301 EDRAARLDRK 

351 ADVVAKKRAY 

401 PERLERtJKKI 

451 EELDVKHALS 

501 TYKPVGPLGA 

40 551 RFYLPGTSRP 

bOl SLIDMRGIET 

b51 SSENTRPKFL 

701 ELLKRCflLHY 

751 KflKdVEEHME 

45 601 LNLADSEVDN 

flSl RUICPERIEV 

101 TYLDLDSKVR 

151 (2RAYCGKMNF 

1001 LSLUSIAESP 

50 1051 (SSMSSLPSSK 



SPKNAKRPRfl EELEDFDKDG 
MLGPFKFGSN VNFVVGNNGS 
KDGC3NSADIS ITLRNRGDDA 
GSVVSTRKEE LIAILDHFNI 
MKATflLEflMK EDYSYIMETK 
IAGLSTHKTN LESLKHEMAU 
MEE<2<3VRLNE AEUKYKDICD 
NEAEVLYNRS LNEYKALKKD 
SULKERVKAF (JNflENSVNfJE 
YN(2R<2LKELK DSKTDRLKRF 
CIHLRDPELA LAIESCLKGL 
PIIVSEFRNE IYDVRHRAAY 
VLLIKNNSVA RAVIlCJSflKPP 
SRDVDSEISD LENEVENKTA 
KELXMKIRKN ISEIRELENI 
flflKENMEHLK SLKIEAENKY 
(2KRGKRHYEE KflKEHLDTLN 
EKSASILDKE INRLRflKIfiA 
TLKKFIKLLG EIMEHRFKTY 
DHKNETLSIS VflPGEGNKAA 
FRCLDEFDVY NDMVNRRIAri 
LIRILRMSDP ERGflTTLPFR 



DEDECKGTTL TAAEVGIIES 
GKSAVLTALI VGLGGRAVAT 
FKASVYGNSI LIflflHISIDG 
(2VDNPVSVLT C2EMSK(2FL<2S 
ERTKEfllHflG EERLTELKRfl 
AVVNEIEKflL NAIRDNIKIG 
KLEKISEETN ARAPECMALK 
DEdLCKRIEE LKKSTDtfSLE 
IE(2F<2<2AIEK DKEEHGKIKR 
GPNVPALLEA IDDAYRflGHF 
LfiAYCCHNHA DERVLflALIIK 
HPDFPTVLTA LEIDNAVVAN 
KNCREAFTAD GDfiVFAGRYY 
(JILNLflflHLS ALEKDIKHNE 
EEHflSVDIAT LEDEAflENKS 
DAIKFKINflL SELADPLKDE 
KKKRELDMKE KELEEKf1S<2A 
EHASHGDREE IflRflYflEARE 
CJflFRRCLTLR CKLYFDNLLS 
FNDI1RALSGG ERSFSTVCFI 
DLILKflADSfl RFRiJFILLTP 
PVTflEEDDPfl R 



BLASTP hits 

55 

No BLASTP hits available 
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SUISSPR0T:RAia_SCHP0 DNA REPAIR PROTEIN RADlfl - •» N = 1-. Score = 
1021-, P 
= 5-2e-103 

5 

PIR:S51470 hypothetical protein YLR3fl3w - yeast (Saccharomyces 
cerevisiae) -i N = It Score = &E3-, P = Se-flE 



10 >SUISSPR0T:RA1A_SCHP0 DNA REPAIR PROTEIN RADlfl. 

Length = 1-.140 

HSPs: 

15 Score = 1021 (153-2 bits)-, Expect = 5.2e-103-, P = S-He-103 
Identities = 315/10=11 (Efl*/.)-, Positives = 54D/1011 (4T/.) 

fluery: 5 AKRKEENFSSPKNAKRPRflEELEDF — DKDGDEDECKGTTLTAAE 

VGIIESIHLKN 55 

20 A R ++N ++ + +E ++DG+ D T T + 

VG+IE IHL N 
Sbjct: MS 

ASRNfiDNRPERflSRLfiRSSSLIElJVRGNEDGENDVLNflTRETNSNFDNRv'GVIECIHLv'N 10M 

25 fiuery: 5b 

FIICHSIILGPXXXXXXXXXXXXXXXXXXXAVLTALIVGLGGRAVATNRGSSLKGFVKDGflN 115 
FflCH L A+LT L + LG +A TNR ++K 

VK G+N 

Sbjct: 105 FMCHDSL- 
30 KINFGPRINFVIGHNGSGKSAILTGLTICLGAKASNTNRAPNNKSLVKflGKN lb3 

Ouery: lib 

SADISITLRNRGDDAFKASVYGNSILIfltfHISIDGSRSYKLKSATGSVVSTRKEELIAIL 
A IS+T+ NRG +A++ +YG SI I++ I +GS Y+L+S 

35 G+V+ST+++EL I 
Sbjct: lb4 

YARISVTISNRGFEAYflPEIYGKSITIERTIRREGSSEYRLRSFNGTVISTKRDELDNIC 
fluery: 17b 

40 DHFNiaVDNPVSVLTflEMSKflFLflSKNEGDICYKFFIIKATflLEflllKEDYSYiriETKERTKE 235 

DH +fi+DNP+++LT(3+ ++I3FL + + +KY+ FflK <2L+<3++E+YS I 

++ TIC 
Sbjct: 224 

DHHGLflIDNPHNILTc5DTAR(2FLGNSSPKEKY(3LFriKGI(2LKflLEENYSLIE(2SLINTtCN 263 

45 

(Juery: 23b 

(2IH(2GEERLTELKR(2CVEKEERF(2SIAGLSTnKTNLESLKHEt1AlilAVVNEIEK(2LNAIRD 215 
+ + ++ L ++ E + ++ + LE K EM UA V 

E+EK+L 
50 Sbjct: 2fli» 

VLGNKKTGVSYLAKKEEEYKLLUEl2SRETENLHNLLE<2KKGENVWA<2VVEVEKEL 33fl 

fiuery: 21b NIKIGEDRAARLDRKi1EE(2<2VRLNEAEc3KYKDI<3DKLEKISEETNARAP- 
ECflALKADVV 354 

55 + E + K+ E + L DI K+ EE RA E 

K+ 

Sbjct: 331 --LLAEKEFfiHAEVKLSEAKENLESIVTNflSDIDGKISS- 
KEEVIGRAKGETDTTKSKFE 315 
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fluery: 3SS 

AKKRA YNEAEVLYNRSLNEYKALKKDDEfiLCKRIEELKKSTDflSLEPERLERflKKISWLK 414 

+ ++ Y +N+ K+D + I K D E ER 

5 ++ + 

Sbjct: 31b DIVKTFDG YRSEHNDVDICJKRDIflN 

SINAAKSCLDVYREflLNTERARENNLGG 4Mfl 

fluery: 415 ERVKAFtJN<2ENSVN<2EIE(2F-<3l2AIEK]>KE EHG 

10 KIKREELDVKHALS 4bO 

+++ N+ N++ +EI +<2 +E + + EG + ++ 

+ + +S 

Sbjct: 441 

S<2IEKRANESNNLi2REIADLSE<2IVELESKRNDLHSALLEI1GGNLTSLLTKKDSIANKIS SDfl 

15 

(2uery: 4bl 

YNflRflLKELKDSKTDRLKRFGPNVPALLEAIDDAYRfiGHFTYKPVGPLGACIHLRDPELA 5S0 

LK L+D + D++ FG N+P LL+ I R+ F + P GP+G + 

+++ + 

20 Sbjct: SOT DdSEHLKVLEDVflRDKVSAFGKNNPflLLKLIT--- 
RETRFi2HPPKGPriGKYriTVKE(2KlilH 5b5 

fluery: 521 

LAIESCLKGLL<2AYCCHNHADERVLi3ALf1KRFYLPGTSRPPIIVSEFRNEIYDVRHRAAY 5flD 
25 L IE L ++ + +H D+ +L+ LM++ T ++V + 

YD ++ 

Sbjct: 5bb LIIERILGNVINGFIVRSHHD<3LILKELf1R<2SNCHAT VVVGK 

YDPFDYSSG bib 

30 (Juery: 5fll HP»~ 

FPTVLTALEIDNAVVANSLIDMRGIETVLLIKNNSVARAVMflSfiKPPKNCREAFT b3fi 

PD +PTVL ++ J>+ V ++LI+ GIE +LLI++ A A f1+ + 

N •+ + 

Sbjct: bl7 EPDSfiYPTVLKIIKFDDDEVLHTLINHLGIEKNLLIEDRREAEAYtlK — 
35 RGIANVTflCYA b74 

fluery: b3T ADG-DflVFAGRYYSSENTR— PKFLSRDVDSEI— 
SDLENEVENKTAfllLNLflUHLSAL b^E 

J> ++ + R S++ + K + IS EEK L 

40 (2 + ++ 

Sbjct: b?S 

LDPRNRGYGFRIVSTflRSSGISKVTPUNRPPRIGFSSSTSIEAEKKILDDLKKlJYNFASN 734 

(3uery: b13 E-KDIKHNEELLKRCflLHYKELKHKIRKNIS-EIRELENIEEHfl-SV-D— 
45 IATLEDEA 745 

+ + K + KR + E I+K I + RE+ ++E + SV D 

I TLE 

Sbjct: 735 

flLNEAKIEfiAKFKRDEflLLVEKIEGIKKRILLKRREVNSLESflELSVLDTEKIQTLERRI 714 

50 

fluery: 74b fiENKSKflKMVEEHMEflfiKENMEH- 
LKSLKIEAENKYDAIKFKINflLSELADPLKDELN-L 6D3 

E + +++ ++ K N EH ++ + + + KI ++ 

L+ EL+ L 

55 Sbjct: 715 SETEKELESYAG(2L<2DAK- 

NEEHRIRDNt2RPVIEEIRIYREKI(2TET(2RLSSLl2TELSRL 653 
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(2uery: flOM 

ADSEVDNl3KRGKRHYEEK(2KEHLmNXXXXXXXXXXXXXXXXXS<2AR<2ICPERIEVEKS flb3 
D + +++ +RH + + + L ++A C 

ER+ V+ S 

5 Sbjct: aSi» RDEKRNSEVMERH-R(2TVESCTNILREKEAKKVt2CA(JVVADYTAKANTRC- 
ERVPVULS 111 

fluery: fibM ASILDKEINRLRflKIflAEHASHG- 
DREEinRflYflEARETYLDLDSKVRTLKKFIKLLGEI 122 
10 + LD EI RL+ +1 G E+ Y A+E + V L + 

++ L E 

Sbjct: 112 

PAELDNEIERL(3H(2IAEL)RNRTGVSVEC3AAEDYLNAICEKHl>(2A<VLVARLTflLLl2ALEET 171 
15 fiuery: 123 

P1EHRFKTYl3(3FRRCLTLRCICLYFDNLLS(2RAYCGKMNFDHKNETLSISV(2PGEGNKA-AF Ifll 
+ R + + +FR+ +TLR K F+ LS<2R + GK+ H+ E L VP 

N A A 

Sbjct: 172 

20 LRRRNEMUTKFRKLITLRTKELFELYLS(2RNFTCKLVIKH(3EEFLEPRVYPANRNLATAH 1031 
Auery: 1B2 N 

DMRALSGGERSFSTVCFILSLUSIAESPFRCLDEFDVYflDMVNRRIAflDLIL 103M 

N ++ LSGGE+SF+T+C +LS+U P RCLDEFDV+MD VNR 

25 +++ +++ 

Sbjct: 1032 

NRHEKSKVSVlUGLSSGEKSFATICriLLSIIJEAIISCPLRCLDEFDVFriDAVNRLVSIKririV 1011 

fluery: 1035 KMADSflRFRflFILLTPflSnSSLPSSKLIRILRMSDPERGflTTLP 1073 
30 A +(2FI +TPl2 PI + K + + R+SDP + LP 

Sbjct: 1012 DSAKDSSDKl2FIFITPfl»HGt3IGL]>KDVVVFRLSDPVVSSSALP 1135 



Pedant information for »KFZphamy2_lln 1 4 t frame 1 
35 — 

Report for DKFZphamy2_llni» .1 



40 ELENGTHJ 1011 

EnUI 12b32b. 13 

ICpIJ b-57 

CHOflOLU SUISSPROT:RAlfl_SCHPO DNA REPAIR PROTEIN RADlfl. le- 

101 

45 CFUNCAT3 03.11 recombination and dna repair ICS- cerevisiaei 

YLR3fl3wJ le-flfl 

CFUNCAT1 Ofl.07 vesicular transport (golgi networki etc) CS- 
cerevisiaei YDLQSfiwJ 3e-lb 

CFUNCAT3 30. 03 organization of cytoplasm IS. cerevisiaei 

50 YDLDSflwJ 3e-lb 

EFUNCATJ 01-13 biogenesis of chromosome structure ICS- 
cerevisiaei YLR0fibw3 2e-14 

CFUNCAT3 1 genome replication! transcr iption-i recombination and 

repair CM- jannaschiii M JlbM33 3e-m 

55 EFUNCAT3 30-0*1 organization of cytoskeleton ICS - cerevisiaei 

YILlMlcJ Ie-12 

CFUNCAT1 03-25 cell cycle control and mitosis ICS • cerevisiaei 

YDR35bw3 fie-12 
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CFUNCAT3 01-10 nuclear biogenesis CS. cerevisiaei YDR35bw3 

fie-lE 

CFUNCAT3 30-10 nuclear organization CS - cerevisiae-i YFLOOflwl 
3e-ll 

5 CFUNCAT3 11-04 dna repair (direct repairs base excision repair 
and nucleotide excision repair) CS- cerevisiae-i YKROTSuO Ee-01 
CFUNCAT3 unclassified proteins CS- cerevisiaei Y0R£lLic3 

Se-QT 

CFUNCAT3 03. E5 cytokinesis CS. cerevisiae-, YHR023w I1Y01 - 
10 myosin-1 isofornO fle-Ofl 

CFUNCAT3 03-04 budding-i cell polarity and filament formation 
CS. cerevisiae-. YHR0S3w MY01 - myosin-1 isofornO fle-Ofl 

EFUNCAT3) Ofl-EE cy toskeleton-dependent transport CS* cerevisiaei 

YHR023W MY01 - myosin-1 isofornO Be-Ofl 
15 CFUNCAT3 Ob- 07 protein modification (glycolsylationi acylationi 

myristylation-i palmi tylation-i f arnesylation and processing) 
CS. cerevisiae-. YKL201c3 Ee-07 

CFUNCAT3 D3-13 meiosis CS - cerevisiae-. YDR£fl5w3 4e-07 

CFUNCAT31 30-13 organization of chromosome structure CS- 
20 cerevisiae-. YDREASwJ Me-07 

CFUNCAT3 Tfl classification not yet clear-cut CS - cerevisiae-. 

YJR134c3 7e-07 

CFUNCAT3 Ob-10 assembly of protein complexes CS- cerevisiae-i 

YPR141c3 7e-07 

25 CFUNCAT3 30-05 organization of centrosome CS- cerevisiae-. 
YPR141c3 7e-07 

CFUNCAT3 11-01 stress response CS • cerevisiae-. YPR141c3 7e-D7 
CFUNCAT3 03-07 pheromone response! mating-type determination-, 
sex-specific proteins CS-- cerevisiae-. YPR141c3 7e-07 
30 CFUNCAT3 r general function prediction CH- influenzae-. 

HI075fc.3 le-Ob 

CFUNCAT3 10-05. other pheromone response activities CS. 
cerevisiaei YHR15ac3 Ee-Ob 

CFUNCAT3 05-0*4 translation (initiation! elongation and 
35 termination) CS. cerevisiae-. YAL035w3 3e-04 

CFUNCAT3 30-0E organization of plasma membrane CS. cerevisiae-. 
YEROQflcl 4e-Q4 

CFUNCAT3 Ofl-lb extracellular transport CS - cerevisiae-. 

YER00flc3 4e-04 

40 CFUNCAT3 01-04 biogenesis of cytoskeleton CS. cerevisiaei 
YKL171c3 7e-04 

CFUNCAT3 03- EE. 01 cell cycle check point proteins CS- 
cerevisiae-. YGL0fibw3 76-04 

CFUNCAT3 Ofl-01 nuclear transport CS- cerevisiaei YDLE07w3 D-001 
45 CFUNCAT3 04-07 rna transport CS- cerevisiae-. YDL207w3 0-001 

CBL0CKS3 BL0Q32fc>C Tropomyosins proteins 
CBL0CKS3 PR010D4B 

CBL0CKS3 BL001E1A Colipase proteins 

CBL0CKS3 PFOOSflOA 
50 CSC0P3 dEtmab_ 1-105-4-1-1 Tropomyosin Crabbit 

(Oryctolagus cuniculus) 3e-0fci 

CEC3 3-b-1.3E Myosin ATPase le-SO 

CPIRKU3 phosphotransferase 1e-lb 

CPIRKW3 nucleus Se-10 

55 CPIRKW3 blocked amino end Ee-07 

CPIRKU3 citrulline Ee-10 

CPIRKU3 tandem repeat Te-SD 

CPIRKU3 heterodimer 3e-ll 
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EPIRKIO endocytosis Ee-13 

EPIRKIO heart Te-ED 

EPIRKIO polymorphism le-10 

EPIRKW J serine/threonine-specif ic protein kinase Te-lb 

5 EPIRKIO transmembrane protein fie-15 

EPIRKIO zinc finger Ee-13 

EPIRKIO metal binding Ee-13 

EPIRKIO DNA binding Se-Qb 

EPIRKIO muscle contraction Te-EO 

10 EPIRKIO acetylated amino end 3e-13 

EPIRKIO actin binding Te-EO 

EPIRKIO mitosis fie-10 

EPIRKIO microtubule binding Se-OT 

EPIRKIO chromosomal protein 3e-ll 

15 EPIRKIO ATP 1e-£0 

receptor Ee-Ob 

EPIRKIO thick filament le-EO 

EPIRKIO phosphoprotein Ee-m 

EPIRKIO glycoprotein le-10 

20 EPIRKIO skeletal muscle le-lfi 

EPIRKIO calcium binding Ee-10 

EPIRKIO alternative splicing 3e-15 

EPIRKIO DNA condensation 3e-ll 

EPIRKIO P-loop ^e-EO 

25 EPIRKIO coiled coil ^e-SD 

EPIRKIO heptad repeat le-10 

EPIRKIO methylated amino acid Te-EO 

EPIRKIO basement membrane le-10 

EPIRKIO immunoglobulin receptor Me-OT 

30 EPIRKIO peripheral membrane protein Ee-13 

EPIRKIO cardiac muscle Te-EO 

EPIRKIO extracellular matrix le-10 

EPIRKIO hydrolase ^e-EO 

EPIRKIO microtubule Ee-10 

35 EPIRKIO muscle 2e-m 

EPIRKIO membrane protein le-10 

EPIRKIO EF hand Ee-IQ 

EPIRKIO cell division fle-10 

EPIRKIO cytoskeleton le-13 

40 EPIRKIO hair Ee-10 

EPIRKIO calmodulin binding Ee-13 

EPIRKIO Golgi apparatus be-Ofi 

EPIRKIO smooth muscle Ee-0? 

ESUPFAM3 conserved hypothetical PUS protein Me-Sb 

45 ESUPFAfll myosin heavy chain Te-EO 

ESUPFAM3 unassigned Ser/Thr or Tyr-specific protein . kinases le- 
lb 

ESUPFAfO centromere protein E Se-OT 

ESUPFAfO calmodulin repeat homology Ee-10 

50 ESUPFAfll alpha-actinin actin-binding domain homology 7e-0? 

ESUPFAfO myosin motor domain homology 'le-EO 

ESUPFAFO tropomyosin Se-Ofi 

ESUPFAfO plectin ?e-07 

ESUPFAfO pleckstrin repeat homology 3e-0T 

55 ESUPFAfO trichohyalin Ee-10 

ESUPFAIO hypothetical protein MJ13SS Ee-Ob 

CSUPFAPO ribosomal protein S1Q homology 7e-07 

ESUPFAfll protein kinase C zinc-binding repeat homology 3e-0T 
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ESUPFAPO giantin le-lE 
ESUPFAMU protein kinase homology 1e-lb 
lESUPFAfll kinesin motor domain homology Be-OT 
ESUPFANJ human early endosome antigen 1 Se-13 
XSUPFAIU 115 protein He-OT 
ESUPFArD cytoskeletal keratin fie-Db 
EPR0SITE3 ATP_GTP_A 1 
CPROSITEJ RGD 1 
EKU3 All_Alpha 
EKIO L0bl_C0HPLEXITY 3-30 V. 

EKWJ C0ILED_C0IL 15- 12 '/. 
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SEG nAKRKEENFSSPKNAKRPRflEELEDFDKDGDEDECKGTTLTAAEVGIIESIHLKNFNCHS 

15 SEG 

PRD ccchhhhhcccccccccchhhhhhcccccccccccccccccccccceeeeehhhhhhccc 
COILS 



20 SEC MLGPFKFGSNVNFVVGNNGSGKSAVLTALIVGLGGRAVATNRGSSLKGFVKDGflNSADIS 

SEG xxxxxxxxxxxxxxxxxxx 

PRD ccccccccceeeeeeccccccchhhhhhhhhhcccccccccccccceeeecccccceeee 
COILS 



25 

SE(2 ITLRNRGDDAFKASVYGNSILI<2(2HISIDGSRSYKLKSATGSVVSTRKEELIAILDHFNI 

SEG 

PRD eeeecccccccccccccccccchhhhhccccceeeeccccchhhhhhhhhhhhhhhhhhh 
COILS 

30 

SE<2 <JVDNPVSVLT(2EI1SKl3FLflSKNEGDKYKFFMKATflLE(3nKEDYSYIHETKERTKE(3IH(JG 

SEG 

PRD cccchhhhhhhhhhhhhhhhhhcchhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhh 
35 COILS 



SE(2 EERLTELKRflCVEKEERFflSIAGLSTMKTNLESLKHEMAUAVVNEIEKflLNAIRDNIKIG 

SEG 

40 PRD hhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhh 
COILS 

CCCCCCCCCCCCCCCC 

SEfl EDRAARLDRKMEEflflVRLNEAEflKYKDIflDKLEKISEETNARAPECIIALKADVVAKKRAY 

45 SEG 

PRD hhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhh 
COILS 

CCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCC 

50 SE<J NEAEVLYNRSLNEYKALKKDDEt2LCKRIEELKKSTD<2SLEPERLER(JKKISlilLKERVKAF 

SEG 

PRD hhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhh 
COILS 



55 

SE<2 <2N<2ENSVN<2EIE(2F<2<2AIEKDKEEHGKIKREELDVKHALSYN<2R(2LKELKDSKTDRLKRF 

SEG • 

PRD hhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhh 
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SE<2 GPNVPALLEAIDDAYRflGHFTYKPVGPLGACIHLRDPELALAIESCLKGLLflAYCCHNHA 
5 SEG 

PRD hhhhhhhhhhhhhhhhhhhhhcccccchhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhh 
COILS 



10 SEfl DERVLflALflKRFYLPGTSRPPIIVSEFRNEIYDVRHRAAYHPDFPTVLTALEIDNAVVAN 
SEG 

PRD hhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhcccccchhhhhhhhhhhhhhhh 
COILS 



15 

SE<3 SLIDHRGIETVLLIKNNSVARAVI1l3S(2KPPKNCREAFTAI>GI>(JVFAGRYYSSENTRPKFL 
SEG 

PRD hhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhchhhhhhhhhhhhhhhhhhh 
COILS 

20 

SE<2 SRDVDSEISDLENEVENKTA<2ILNLfl(3HLSALEOIKHNEELLKRC<2LHYKELKMKIRKN 
SEG 

PRD hhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhh 
25 COILS 

CCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCC 

SEA ISEIRELENIEEHi3SVDIATLEDEA(3ENKSKMKI1VEEHriE(3(2KENriEHLKSLKIEAENKY 
SEG 

30 PRD hhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhh 
COILS 

CCCCCCCCCCCCCCCCCCCCCCCCCCCCCC 

SEA DAIKFKINflLSELADPLKDELNLADSEVDNflKRGKRHYEEKflKEHLDTLNKKKRELDMKE 

35 SEG xxxxxxxxxx 

PRD hhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhh 
COILS 

CCCCCCCCCCCC CCCCCCCCCCCCCCCCCCCCC 

40 SEfl KELEEKf1S(2ARc2ICPERIEVEKSASILDKEINRLRf2KI<2AEHASHGDREEII"IR<2Y<2EARE 
SEG xxxxxxx ■ 

PRD hhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhcchhhhhhhhhhhhh 
COILS 

CCCCCCCCCCCCCC 

45 

SEA TYLDLDSKVRTLKKFIKLLGEIHEHRFKTY(3dFRRCLTLRCKLYFDNLLSi2RAYCGKMNF 
SEG 

PRD hhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhcccceee 
COILS 

50 

SEC DHKNETLSISVfiPGEGNKAAFNDMRALSGGERSFSTVCFILSLIilSIAESPFRCLDEFDVY 
SEG 

PRD eccccccceeeeccccchhhhhhcccccccchhhhhhhhhhhhhhhhccccchhhhhhhh 
55 COILS 



SE<2 MDHVNRRIAi1DLILKI1ADS(3RFRi3FILLTPi3SHSSLPSSKLIRILRriSDPERG«3TTLPFR 
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SEC 

PRD hhhhhhhhhhhhhhhhhhhhhhceeeeeeccccccccccceeeeeecccccccccccccc 
COILS 



SE(2 PVTflEEDDDtJR 

SEC 

PRD chhhhhhhccc 
COILS 



Prosite for DKFZphamy2_lln4 -1 

15 PSDDDlb lEb->lET RGD PDOCDDOlb 

PSDD017 7fe,->a4 ATP_GTP_A PD0CD0017 



(No Pfam data available for DKFZphamy5_lln4 - 1) 

20 
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5 group: cell structure and motility 

DKFZphamy2_121f n encodes a novel ESI amino acid protein with 
high similarity to a Rat ankyrin binding glycoprotein-1 related 
mRNA - 

10 

Ankyrin binding glycoproteins play a role in neural cell adhesion 
and in prosate tumor cell transformation. DKFZphamy2_121f IT ■ p3 is 
expressed in braini uterus and prostate above average. 

15 The new protein can find application modulation of cyto skeleton- 
membrane interactions - 



similarity to ankyrin binding glycoprotein-1 related mRNA (Rattus 
20 norvegicus) 

Sequenced by DKFZ 

Locus: /map= n l" 

25 

Insert length: mifl bp 

Poly A stretch at pos. IM?^ polyadenylation signal at pos. mbO 



30 1 CGGCACCTTC GCCGGCGCCC TCGCCCACCC CAGCCCCGCC CCAGAAGGAG 

SI CAGCCCCCCG CGGAGACCCC TACAGACGCT GCTGTCTTGA CCTCACCCCC 

101 AGCCCCTGCT CCCCCGGTGA CCCCTAGCAA ACCAATGGCC GGCACCACAG 

151 ACCGAGAAGA AGCCACTCGG CTCTTGGCTG AGAAGCGGCG CCAGGCCCGG 

201 GAGCAGCGGG AGCGCGAGGA GCAGGAGCGG AGGCTGCAGG CAGAAAGGGA 

35 251 CAAGCGAATG CGAGAGGAGC AGCTGGCACG GGAGGCCGAG GCCCGGGCGG 

301 AGCGGGAGGC GGAGGCCCGG AGGCGGGAGG AGCAGGAGGC ACGAGAGAAG 

351 GCGCAGGCCG AGCAGGAGGA GCAGGAGCGG CTGCAGAAGC AGAAAGAGGA 

4D1 GGCCGAAGCT CGGTCGCGGG AAGAGGCGGA GCGGCAGCGT CTGGAGCGGG 

■451 AAAAGCACTT CCAGCAGCAG GAGCAAGAGC GGCAAGAGCG CAGAAAGCGT 

40 501 CTGGAGGAGA TCATGAAGAG GACTCGGAAG TCAGAAGTTT CTGAAACCAA 

SSI GAAGCAGGAC AGCAAGGAGG CCAACGCCAA CGGTTCCAGC CCAGAGCCTG 

bOl TGAAAGCTGT GGAGGCTCGG TCCCCAGGGC TGCAGAAGGA GGCTGTGCAG 

bSl AAAGAGGAGC CCATCCCACA GGAGCCTCAG TGGAGTCTCC CAAGCAAGGA 

701 GTTGCCAGCG TCCCTGGTGA ATGGCCTGCA GCCTCTCCCA GCACACCAGG 

45 751 AGAATGGCTT CTCCACCAAC GGACCCTCTG GGGACAAGAG TCTGAGCCGA 

flOl ACACCAGAGA CACTCCTGCC CTTTGCAGAG GCAGAAGCCT TCCTCAAGAA 

fl51 AGCTGTGGTG CAGTCCCCGC AGGTCACAGA AGTCCTTTAA GAGGGTTTGC 

101 CTTGGATCCG GGCACAGTTG TGAGGGCTCC TCTGCATCAC CTACCAGGAT 

151 GTCTGGAGGA GAAAAAGACA GAACAAAGAT GGAAGTGGCC TGGGCCCCTG 

50 1QD1 GGGGTGGGTC CTCTCTGTTG TTTTTAATCT GCACCTTATA GACTGATGTC 

1051 TCTTTGGCCG GAGCCAGATC TGCCCCTCAG TGCATTCGTG TGCTCGCACG 

11D1 CGCAGACATC CCTTCTCCCC CATACACACA TATACACTCA CAGCCTCTCT 

1151 GGCCTCTTCC CTTGGGGAGG GGCCACCTGT AGTATTTGCC TTGATTTGGT 

12D1 GGGGTACAGT GGATGTGAAT ACTGTAAATA GCTTGTGCTC AGACTCCTCT 

55 1251 GCGTGGAGAG GGTGGGTGCA GGAGGCAGAC CCTCCCCCCA AAGCCCCCTG 

1301 GGGAGATCTT CCTCTCTCTA TTTAACTGTA ACTGAGGGGG ATCCCAGGTC 

1351 TGGGGATGGG GGACACCTTG GGCCACAGGA TACTGGTTGC TTCAGGGGTA 

mOl CCCATGCCCC CTGCCCTCGC CTGGAATCAG TGTTACTGCA TCTGATTAAA 
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1M51 TGTCTCCAGA AATAAAGAAT AATTCTGCCA AAAAAAAAAA AAAAAAAA 



BLAST Results 

5 

No BLAST result 

10 Medline entries 

No Medline entry 



15 



40 



45 



Peptide information for frame 3 



20 ORF from 13S bp to &&? bpi peptide length: 251 
Category: putative protein 

Classification: Cell signaling/communication 

1 MAGTTDREEA TRLLAEKRRfi AREfiREREEfl ERRLfiAERDK RMREEflLARE 
25 SI AEARAEREAE ARRREEtfEAR EKA(2AE(2EE(3 ERLflKflKEEA EARSREEAER 

101 flRLEREKHFfl <2fiEflER(2ERR KRLEEIMKRT RKSEVSETKK (2DSKEANANG 
1S1 SSPEPVKAVE ARSPGLflKEA VflKEEPIPflE PflUSLPSKEL PASLVNGLfiP 
E01 LPAH(2ENGFS TNGPSGDKSL SRTPETLLPF AEAEAFLKKA VVflSPflVTEV 
SSI L 

30 

BLASTP hits 

35 No BLASTP hits available 

Alert BLASTP hits for DKFZphamy2_121f 11, frame 3 
No Alert BLASTP hits found 

Pedant information for DKFZphamy2_121f ITi frame 3 



Report for DKFZphamy2_121f IT • 3 



ELENGTH3 215 

EMO 33S17.^fe. 

EpIJ 5-bl 

50 EH0M0L3 TREMBLNEU : AB033D13_1 gene: "KIAAllfl?"^ product: 

"KIAAllfi? protein"i Homo sapiens mRNA for KIAA11A7 protein-, 
partial cds- le-bM 

[BLOCKS! PFDimDD 

EBL0CKS3 BLODmSD Neuromodulin (GAP-i»3> proteins 

55 [BLOCKS! BL00fl2bC 

EBL0CKS3 BLDDM55C Granins proteins 

[BLOCKS! PRD01t7C 

[BLOCKS! PF0DT1BA 
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EBLOCKSJ 

CBLOCKSJ 

CBLOCKSJ 

IK III 31 

IKbO 

CKU] 
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BLOO^B Clathrin light chain proteins 

PROOOMID 

PRDOIIOA 

All_Alpha 

L01)_C0MPLEXITY 51 .11 V. 
C0ILED_C0IL 10.51 



SE<2 APSPAPSPTPAPP<2KE<3PPAETPTDAAVLTSPPAPAPPVTPSKPMAGTTDREEATRLLAE 

10 SEG XXXXXXXXXXXXXXXXXXXXXXX • • XXXXXXXXXXXXXXXXXXXXX .-XX 

PRD cccccccccccccccccccccccccceeeeccccccccccccccccccchhhhhhhhhhh 
COILS 



15 SEd KRRflARE(3REREE(3ERRL(3AER])KRf1REE(2LAREAEARAEREAEARRREEi3EAREKA(2AE 
SEG XXXXXXXXXXXXXXXXXXXXXX- • - • xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx 
PRD hhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhh 
COILS 

CCCCCCCCCCCCC 

20 

SE<2 fiEE(2ERL<2K<2KEEAEARSREEAER(2RLEREKHF<2<2<2E<2ER<2ERRKRLEEII'1KRTRKSEVS 

SEG xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx 

PRD hhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhccccc 
COILS 

25 CCCCCCCCCCCCCCCCCC 

SEfl ETKKflDSKEANANGSSPEPVKAVEARSPGL(2KEAV<aKEEPIP<2EP<3liJSLPSKELPASLVN 

SEG 

PRD chhhhhhhhhccccccccceeeeccccccchhhhhhhhcccccccccccccccccceeee 
30 COILS 



SE<3 GL(2PLPAHf3ENGFSTNGPSGDKSLSRTPETLLPFAEAEAFLKKAVV(2SPl3VTEVL 

SEG 

35 PRD ecccccccccccccccccccccccccccchhhhhhhhhhhhhhhhhccccccccc 
COILS 



(No Prosite data available for DKFZphamy3_121f 1T.3) 

40 

(No Pfam data available for DKFZphamy2_lHlf 11 .3) 



1 
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DKFZphamy2_121m2 



PCT/EB01/02050 



5 group: cell cycle 

DKFZphamy2_121m2 encodes a novel 450 amino acid protein with 
similarity to human PA2L-T2 protein- 

10 PA2b-T2 is a pS3 responsive gene- The protein is predominantly 
expressed in brain-, breast and kidney and may represent a 
potential novel regulator of cellular growth. Isoforms are 
differentially induced by genotoxic stress (UV-, gamma-irradiation 
and cytotoxic drugs)in a p53-dependent manner. 



15 



25 



30 



The new protein can find application in modulating cell division 
and apoptosis pathways* 



20 similarity to PA2b nuclear protein isoforms (Homo sapiens) 
probably differential polyadenylation 
Sequenced by DKFZ 
Locus: unknown 



Insert length: 3327 bp 

Poly A stretch at pos- 330b-. polyadenylation signal at pos- 3271 



1 TCCAGCACCA AAGCGGCCGT TCTCGGATTC CGGAGCGTTC TGGAGCCCCG 

SI AGAGACGCCC CGGGGTTCTA GAAGCTCCCC GGCGGCGCCC AGTCCCGGCT 

101 TCATTCGGGC GTCCCTCCGA AACCCACTCG GGTGCACGGG TCGTCGGCGA 

35 1S1 GCCGCGACCG GGTCCTGGCG CGCACCATGA TCGTGGCGGA CTCCGAGTGC 

201 CGCGCAGAGC TCAAGGACTA CCTGCGGTTC GCCCCGGGCG GCGTCGGCGA 

251 CTCGGGCCCC GGAGAGGAGC AGAGGGAGAG CCGGGCTCGG CGAGGCCCTC 

301 GAGGGCCCAG CGCCTTCATC CCCGTGGAGG AGGTCCTTCG GGAGGGGGCT 

351 GAGAGCCTCG AGCAGCACCT GGGGCTGGAG GCACTGATGT CCTCTGGGCG 

40 401 AGTAGACAAC CTGGCAGTGG TGATGGGCCT GCACCCTGAC TACTTTACCA 

451 GCTTCTGGCG CCTGCACTAC CTGCTGCTGC ACACGGATGG TCCCTTGGCC 

501 AGCTCCT6GC GCCACTACAT TGCCATCATG GCTGCCGCCC GCCATCAGTG 

551 TTCTTACCTG GTAGGCTCCC ACATGGCCGA GTTTCTGCAG ACTGGTGGTG 

bOl ACCCTGAGTG GCTGCTGGGC CTCCACCGGG CCCCCGAGAA GCTGCGCAAA 

45 b51 CTCAGCGAGA TCAACAAGTT GCTGGCGCAT CGGCCATGGC TCATCACCAA 

701 GGAACACATC CAGGCCTTGC TGAAGACCGG CGAGCACACT TGGTCCCTGG 

751 CCGAGCTCAT TCAGGCTCTG GTCCTGCTCA CCCACTGCCA CTCGCTCTCC 

501 TCCTTCGTGT TTGGCTGTGG CATCCTCCCT GAGGGGGATG CAGATGGCAG 

351 CCCTGCCCCC CAGGCACCTA CACCCCCTAG TGAACAGAGC AGCCCCCCAA 

50 101 GCAGGGACCC GTTGAACAAC TCTGGGGGCT TTGAGTCTGC CCGCGACGTG 

151 GAGGCGCTGA TGGAGCGCAT GCAGCAGCTG CAGGAGAGCC TGCTGCGGGA 

1001 TGAGGGGACG TCCCAGGAGG AGATGGAGAG CCGCTTTGAG CT6GAGAAGT 

1051 CAGAGAGCCT GCTGGTGACC CCCTCAGCTG ACATCCTGGA GCCCTCTCCA 

1101 CACCCAGACA TGCTGTGCTT TGTGGAAGAC CCTACTTTCG GATATGAGGA 

55 1151 CTTCACTCGG AGAGGGGCTC AGGCACCCCC TACCTTCCGG GCCCAGGATT 

1201 ATACCTGGGA AGACCATGGC TACTCGCTGA TCCAGCGGCT TTACCCTGAG 

1251 GGTGGGCAGC TGCTGGATGA GAAGTTCCAG GCAGCCTATA GCCTCACCTA 

1301 CAATACCATC GCCATGCACA GTGGTGTGGA CACCTCCGTG CTCCGCAGGG 
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13S1 CCATCTGGAA CTATATCCAC 

mOl GATTATGGGG AGGTGAACCA 

mSl CAA6ACAGTG GCCTGCTACC 

1501 TCTTCTGGAG GCACTTCCGC 

5 1SS1 CTCCTGGAGG CGCGCATGCA 

lfc.01 CACCCGCTAC ATGACCTGAC 

IbSl CCCCACAAGG ACTTCTCTGT 

1701 CATGCCCACC CTCCCCACGC 

1751 CGAAGCCACA CCCTCCCTTT 

10 1301 GACTCTGGGA TCTCAGCCCT 

1851 TCCTAAGGGA CCACACCCTT 

1101 CACAGGAAAG AAGCCGGGCC 

1151 TGGCCTTCCT GAACTGGGAA 

2001 AAATGCCTCC GGGACTGACA 

15 2051 CATTTCCAGA TTTCATTACC 

5101 AGTCAGGGTC ACAGCTGGTC 

2151 GTTGGGCAGC CTGAGGCTGT 

2201 CCTTTGCCCT TTTTCCCTTT 

2251 AGAGAGGCCA AGTACATAAA 

20 2301 TTTGAGCCTT TGCTGGTCAC 

2351 TTATAAATCC TCTTTATTTT 

2M01 GCCACAGTGT GTGAGAGGAG 

2M51 CTGCACCTGC CTCGCAGAGG 

2501 CCTGCCGCAG ATGTCTCCCA 

25 2551 GGCACCATGG CTCAGCA6GA 

2b01 ACCCTGCCCC TGGGCCATGG 

2t,Sl CTGACTGCCA CAGCTGCAGA 

2701 TCTGAAGCTG CCCCTGGGAT 

2751 TTTTCTTGCA AGATCAGGGA 

30 2fi01 GCCTTCTGAG GACTCCCACC 

2A51 CTGGGGCTTA AGTGGGTTGC 

2101 CAGGCACTTT CTGTAGCAAA 

2151 CTTCTAGCAG TCTGTGCCTC 

3001 AAGGCAAGGG CCGTGCTGCT 

35 3051 GTGTGCCACA TTAAATACCC 

3101 CTTCCGGCCT GAAAGCCCTC 

3151 CCCCGGCATG GGGATCTGGG 

3201 TCACTGGGTG GGTAGGGAGT 

3251 GGCTTTGCTA CCAGTTCCAT 

40 3301 TGTTTCATAA AAAAAAAAAA 



T6CGTCTTTG GCATCAGATA TGATGACTAT 
GCTCCTGGAG CGGAACCTCA AGGTCTATAT 
CAGAGAAGAC CACCCGAAGA ATGTACAACC 
CACTCAGAGA AGGTCCACGT GAACTTGCTG 
AGCCGCTCTG CTGTACGCCC TCCGTGCCAT 
TCCTGAGCAG GACCTGGGCC CGGTTCAGCT 
CTGGAGACAG CCCCAGACCC TTTTGTGTCC 
TGCAGTGGGC TTGTGTGTGA TGTGCAGTCC 
TCCTCACTGG AATGGACAGT TCATTGCACT 
GCTCCTGGGA GCTGGAAGAG CACTTGGAGA 
CCTCCTTCCC CTGCCCACAG AGGCAGAGGG 
AAGCTCGGAA TTAATGTGCC ACAAGTGTTG 
GTCCCTGGCT GGCCCCCGGG GGAGAGGGGC 
CTCCAGGCAG CTTTGCCTTC TCTCCCCTGT 
TCCTACTTGC CATTCACCCA TCAATGTGAA 
TGTGTGTCCA GTTCCCTAAA AGCCTGTTCT 
TGCCCGAATC CTAGTTCAGT TTTTTGACTT 
TCTCCATGCT TAATGGTGTG AGGCGTCAGG 
AAAAAAAAAA AGCAGATTAT CTCTAGAGAG 
ATTGCCTTCT GAAGAGGAGG GAGTATTAGA 
GGTCCTTTAT GCTTGAGGTT CCAACCTGGA 
GAGGAGAGGG AGAATTCTGT TCTCCCAGAG 
CCAGCACCCC ACTCTCCTGC CTCCAGTGGC 
AAAAGTTGAG CCTTTCTAGA TGGCTTAGGT 
GGGGQGGGkG GCACCAGGGT TCTTGTTTGG 
CCAGGTGACC ATGGCTACAT TGCCAAACCT 
CTGAGAGGGT GGGTCTGAGT CCCCACAATG 
TCTCAGGCCA ACCTGCCAAC AGCAAGCGGA 
CCCCATTTCT GCAGCCAGTG TCTCCTGGGT 
CCCATCCCAG TATCTCATCT GTCCCCTCTC 
TTCCAGGCAG AAGCAGCCAA GGACCGATTC 
TGACTGTGAA TTACGACTTC TCTTGCCCTT 
CTCTCTGACC AGTTTGGAGG GCACTGAAGA 
GCTGGGCGGG GCAGGAGAGG AGCCTGGCCA 
GTGCAGGCGC GGAGAAGCAA CCGGCACCCC 
CCTGCAAGAA GGTGTGCAGG AGAGAAGAGG 
TTCTAGAGGG CATGTGATGA CTGTAAATGT 
GGTATCCAGT GTTCAAGTGC AGAAATCTTT 
ATGATGAGAA ATAAACGTTC GCTGA6GTTT 
AAAAAAA 



BLAST Results 



45 

No BLAST result 



Medline entries 

50 

1502M170: 

Buckbinder L-t Talbott R-i Seizinger B-R. n Kley N-i Gene 
regulation by 

55 temperature-sensitive p53 mutants: identification of p53 response 
genes. Proc Natl- Acad- Sci- U-S-A- 11 (22) : lOmO-lObm (111M ) . 

112H117: 

-71- 



WO 01/98454 PCT/IB01/02050 

Velasco-Pliguel S, Buckbinder L, Jean P-i Gelbert L, Talbott R-, 
Laidlaw 

Ji Seizinger Bi Kley PA2b, a novel target of the p53 tumor 

suppressor and member of the GADD 
5 family of DNA damage and growth arrest inducible genes. Oncogene 
1111 

dan 7,13(1) :127-37 



10 



30 



35 



55 



Peptide information for frame 3 



15 ORF from 177 bp to Iblb bp} peptide length: MSO 

Category: strong similarity to known protein 
Classification: Cell division 

1 I1IVADSECRA ELKDYLRFAP GGVGDSGPGE EfiRESRARRG PRGPSAFIPV 

20 SI EEVLREGAES LEUHLGLEAL MSSGRVDNLA WflGLHPDYF TSFURLHYLL 

101 LHTDGPLASS WRHYIAIHAA ARHdCSYLVG SHMAEFLflTG GDPEULLGLH 

1S1 RAPEKLRKLS EINKLLAHRP ULITKEHIflA LLKTGEHTUS LAELIflALVL 

201 LTHCHSLSSF VFGCGILPEG DADGSPAPflA PTPPSEflSSP PSRDPLNNSG 

2S1 GFESARDVEA LnERH(2(2L<3E SLLRDEGTSfl EEI1ESRFELE KSESLLVTPS 

25 301 ADILEPSPHP DHLCFVEDPT FGYEDFTRRG A<3APPTFRA(2 DYTUEDHGYS 

351 LKJRLYPEGG (2LLDEKF(JAA YSLTYNTIAN HSGVDTSVLR RAIblNYIHCV 

401 FGIRYDDYDY GEVNflLLERN LKVYIKTVAC YPEKTTRRMY NLFWRHFRHS 

451 EKVHVNLLLL EARMUAALLY ALRAITRYHT 



BLASTP hits 

No BLASTP hits available 

Alert BLASTP hits for DKFZphamy2_121m2, frame 3 



TREMBL:AF033120_1 gene: "PA2b"} product: "p53 regulated PA2b-T2 
nuclear 

40 protein". Homo sapiens p53 regulated PA2b-T2 nuclear protein 
(PA2b) 

mRNA, complete cds-, N = 1, Score = 1377, P = T^e-lMl 

TREHBL:AF033122_1 gene: "PA2b"=i product: "non-p53 regulated PA2b- 
45 Tl 

nuclear protein". Homo sapiens non-p53 regulated PA2b-Tl nuclear 
protein (PA2b) mRNA, complete cds., N = 1, Score - 13b3, P = 3e- 
13T 

50 TREI1BL:AFD33121_1 gene: "PA2b"i product: "pS3 regulated PA2b-T3 
nuclear 

protein". Homo sapiens p53 regulated PA2b-T3 nuclear protein 
(PA2b) 

mRNA, complete cds-, N = 1, Score = 1307, P = 2-5e-133 



>TREHBL:AF033120_1 gene: "PA2b"', product: " P 53 regulated PA2b-T2 
nuclear 

-72- 



WO 01/98454 



PCT/IB01/02050 



10 



protein"! Homo sapiens pS3 regulated PAEb-TE nuclear 
protein (PAEb) mRNAi 
complete cds- 

Length = 4TE 

HSPs: 

Score = 1377 (SOb-b bits)-, Expect = ^7e-141i P = 1.7e-141 
Identities = S77/471 (SB*)-, Positives = 334/471 (?D^> 



fluery: EE GVGDSGPGEE(2RESRARRGPR GPSAFIPVEEVLREGAESLEflH- 

LGLEALMSSGRV 7b 

G G G +a E R PR GPS FIP +E+L+ G+E + H L 

++ + GR+ 
15 Sbjct: EE 

GCKt2CGGGRD<2DEELGIRIPRPLG<2GPSRFIPEKEIL<2VGSEl)A(2l1HALFAI)SFAALGRL fll 



<2uery: 77 

DNLAVVnGLHPDYFTSFURLHYLLLHTDGPLASSURHYIAIMAAARHflCSYLVGSHriAEF 13b 
20 DN+ +VM HP Y SF + + LL DGPL +RHYI 

IflAAARHflCSYLV H+ +F 
Sbjct: as 

]>NITLVI1VFHP(3YLESFLKTflHYLLl3riDGPLPLHYRHYIGII1AAARHaCSYLVNLHVNI>F 141 

25 fluery: 13? 

LflTGGDPEULLGLHRAPEKLRKLSEINKLLAHRPWLITKEHI(2ALLKTGEHTIilSLAELI<2 lib 
L GGDP+WL GL AP+KL+ L E+NK+LAHRPULITKEHI+ LLK 

EH+USLAEL+ 
Sbjct: ms 

30 LHVGGDPKULNGLENAP(3KL(3NLGELNKVLAHRPULITKEHIEGLLKAEEHSIilSLAELVH EDI 

fluery: 117 ALVLLTHCHSLSSFVFGCGILPEGDADGXXXXXXXXXX 

XXXXXXXXRDPLNNS S41 

A+VLLTH HSL+SF FGCGI PE 1>G 

35 P+N++ 

Sbjct: EOE 

AVVLLTHYHSLASFTFGCGISPEIHCDGGHTFRPPSVSNYCICDITNGNHSVDEflPVNSA Ebl 

fluery: ESQ GGF ESARDVEALMERN(2(2L<3ESLLR1>EG- 

40 TSJJEEMESRFELEKSESLLVTPSAPILE 305 

+S +VEALME+I1+I2LI2E RDE SdEEfl SRFE+EK ES+ V 

S+D E 

Sbjct: EbS ENVSVSDSFFEVEALMEKI1R(2L<3EC~ 
RDEEEASflEEIIASRFEIEKRESIlFVF-SSDDEE 318 

45 

<2uery: 30b 

PSPHPDMLCFVEDPTFGYEDFTRRGAfiAPPTFRAflDYTbJEDHGYSLIdRLYPEGGflLLDE 3bS 
+P + ED ++GY+DF+R G P TFR (32>Y UEDHGYSL+ 

RLYP+ GiJL+DE 
50 Sbjct: 31T VTPARAVSRHFEDTSYGYKDFSRHGUHVP- 
TFRVflDYCUEDHGYSLVNRLYPDVGflLIDE 377 



Query: 3bb 

KFflAAYSLTYNTIAflHSGVPTSVLRRAIUNYIHCVFGIRYDDYDYGEVNflLLERNLKVYI 4S5 
55 KF AY+LTYNT+AHH 

VDTS+LRRAIUNYIHC+FGIRYDDY])YGE+N(JLL+R+ KVYI 
Sbjct: 37fl 

KFHIAYNLTYNTflAflHKDVDTSnLRRAIblNYIHCriFGIRYDDYDYGEINflLLDRSFKVYI 437 
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fluery: 42b 

KTVACYPEKTTRRMYNLFURHFRHSEKVHVNLLLLEARntfAALLYALRAITRYNT MAD 
KTV C PEK T+RMY+ FUR F+HSEKVHVNLLL+EARIl£3A 

5 LLYALRAITRYMT 
Sbjct: H3fl 

KTVVCTPEKVTKRMYDSFUR<2FKHSEKVHVNLLLIEARM(2AELLYALRAITRYMT M12 
10 Pedant information for DKFZphamy2_121m2-i frame 3 

Report for DKFZphamy2_121m2 . 3 

15 

ELENGTHJ iJflO 
CMU3 51413. *ia 

CpI3 5-57 

EHOMOLJ TREMBL: AF033120_1 gene: n PA2b"i product: "pS3 

20 regulated PA2b-T2 nuclear protein n i Homo sapiens p53 regulated 
PA2b-T2 nuclear protein (PA2b> mRNAn complete cds- le-151 
IBL0CKS3 PRGOOnD 
EKUJ All_Alpha 

IKhil L0U_C0MPLEXITY 3-75 '/. 

25 

SE(2 IIIVADSECRAELKDYLRFAPGGVGDSGPGEEflRESRARRGPRGPSAFIPVEEVLREGAES 

SEG 

PRI> cccchhhhhhhhhhhhhccccccccccccchhhhhhhccccccccccccchhhhhhhhhh 

30 

SE<2 LEfiHLGLEALnSSGRVDNLAVVMGLHPDYFTSFWRLHYLLLHTDGPLASSWRHYIAINAA 

SEG 

PRD hhhhhhhhhhhhhccccccceeeeccccchhhhhhhhhhhhhcccccchhhhhhhhhhhh 

35 SEfl ARH(JCSYLVGSHMAEFLflTGGl>PElilLLGLHRAPEKLRK:LSEINKLLAHRPULITKEHIt2A 

SEG 

PRI> hhhhhheeeccccceeeecccccccccccccchhhhhhhhhhhhhhhhccceeehhhhhh 

SEfl LLKTGEHTUSLAELIfiALVLLTHCHSLSSFVFGCGILPEGDADGSPAPflAPTPPSEflSSP 

40 SEG xxxxxxxxxxxxxxxx 

PRD hhhhhcchhhhhhhhhhhhhhhhccccccccccccccccccccccccccccccccccccc 

SE(3 PSRDPLNNSGGFESARDVEALnERI1(3t3L(3ESLLRDEGTSflEEriESRFELEKSESLLVTPS 

SEG xx 

45 PRD cccccccccccchhhhhhhhhhhhhhhhhhhhhhhhccchhhhhhhhhhhhhheeeeccc 

SE<3 ADILEPSPHPDt1LCFVEDPTFGYEDFTRRGA<2APPTFRA(2DYTIi)EDHGYSLIf2RLYPEGG 

SEG 

PRD ccccccccccceeeeccccccccccccccccccccceeeeeeccccccceeeeecccccc 

50 

SEA <2LLDEKF<2AAYSLTYNTIAMHSGVDTSVLRRAIUNYIHCVFGIRYDDYDYGEVN(2LLERN 

SEG 

PRD hhhhhhhhhhhhhcccceeecccccchhhhhhhhhhhhhhccccccccccchhhhhhhhh 

55 SE<3 LKVYIKTVACYPEKTTRRMYNLFURHFRHSEKVHVNLLLLEARMfiAALLYALRAITRYriT 

SEG 

PRD hheeeeeeeecccchhhhhhhhhhhhhhccchhhhhhhhhhhhhhhhhhhhhhhhhhccc 
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(No Prosite data available for DKFZphamy2_121mE-3) 
(No Pfam data available for DKFZphamyB_121m2.3) 
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5 group: transmembrane protein 

DKFZphamy2_121ol7 encodes a novel 212 amino acid protein without 
similarity to known proteins- 

10 The novel protein contains 1 transmembrane region- 
No informative BLAST resultsi No predictive prosite-. pfam or SCOP 
motif e - 

The new protein can find application in studying the expression 
15 profile of amygdala-specific genes and as a new marker for 
amygdala cells. 



unknown protein 

20 

Pedant: TRANSMEMBRANE 1 
Sequenced by DKFZ 
25 Locus: /map= n lfib.b cR from top of Chr22 linkage group" 
Insert length: EblO bp 

Poly A stretch at pos- Sttli polyadenylation signal at pos- 2L34 

30 

1 TGCTGGGAAA AGTGACTGCG ATTCTGAAGA ACCGCTGCCT TGCAAGGTCA 

SI AGGACATTCA GTGGTTGCTG GGGTCCGCAG ACTACT6CCA CCCACTCACC 

101 ATCAACTCTG TTAGCCCAAT TGCCCTGCTG AACAACTGCC TGAATACAGG 

1S1 CTTTAGGTTC CCCTGGACTC CAGCCAAGGC TGTTCAGGTG GGACCATGGT 

35 2D1 GCTCTTTAAG CGTGATCGGA GGGAA6ACAC ACAGCAGGGC CACCATTCCA 

5B1 TGAATGGGAG GTGTACAGAT CACTTTCTCT TTGTGCTCA6 TTCTCTTCTG 

3D1 TCTCCAGCAG CTATATTGGT AAGACTAGTA CCTGCCAGGG AGAGGTGCCC 

351 CCAAGTGAAG GGGTACAGTG GCACCTGGGA AAAGGCACCT GGAAGGTTTC 

401 CATGTGGCCC AGCCCAGCAT GGAAGCAGGG TGGGAACTCT GCTGTGTCGC 

40 451 CAGCCCTCAC TCTACTCAAG TGGCTTTTTG AGAGCCCTGC CATGTCTGTG 

501 TCAGGCCTGT GCTGCTTCAC ACCCTACAGC TGCCTGGGAA AGGCCGGCCA 

551 CGCTCCCTGT CCACACACTC CCTGTCCACA CACTCCCTGT CCACAACTGC 

bOl AGCCGGGCCC TCTGCCTATG GGCACCCAAT CCAAGCAGCT GCTCCACCTT 

bSl TGTTTGGCAT GGTGATTTGT GTTTTTTCTC TTGGTGCTTA TGTGTGTGGG 

45 701 CTTGGGACGA GTGCTGGTAT GCACTTAGGA CCTTCTTGAT AGCTCCCTGC 

751 ACTTTGGAAC ACGGAGCAGA TGAGAGAGGG TCAGGGGCTT GCCCTCCACC 

flOl TTGGACTTGG AAGAAGCCCA CATTGGAGAG GTGAGGACCC CATGGTGGCT 

851 CTAGTGGAAG ATACGTTAGT CTCCAGCTAA GGAGGATGAG GCGCAGCCCC 

^01 AGAGGGAGAC CTCAGTGATA GGGGATCAGG CTACGAAAGT GGGGGAAGGG 

50 151 AGATGCTTTG TACATATTTT GGGGTTATAA TTTCTCTAAA TTTTAGGAGA 

1001 ACGGGTATTG ATTGATAAAA GGGACAGGCA GTAGTGTTCA ACAGTGCATG 

1051 TGAAGGAAAG TTCTGTTTTC CATGGTTTTG ACATTCTTTG GACTGTATTG 

1101 TGACTGCTGT CTGGTCCACA TGGTACCCTT TTGGTAAGTA GGCTTCAGTG 

1151 CATACCAGGG TATCACTGGA GATGGGAGTT AGTGAAGGGG TGACTCCCTG 

55 1201 GCCTAGTATA GTGTGACCCT GGGACAACTT AATGTCCTAA AGCATTTTGG 

1251 TGACTTCTAG GGAATAGCAA AGACCTATTT CATTGTCCCC AGGTAA6TAT 

1301 GTGATGAGCA ATGAGGAGGA GTGGAAAACA AAACCCAGAA AGTGCGGCAG 

1351 GACCAGCCTG ACGCACACGC TCCTGTTGTC ATGGCAGACA GCCGCCTTGG 

-76- 
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1401 GTG6GCACCA CCCTGGCAGT TCCAGCCTGT AGGGGAGTGA AGGGACATGG 

1451 CTGAGCTGGG CATGTGCTGA GGTTGACTTA GGGAACAAGC CCTGGGATTG 

1SD1 GACAAAAGGG CCCATGCTGC AGCCACTGAC TGGGGGCAGA GCTCTGGGTG 

1551 GAAGAGGGAA GAGATCCTAA TGGAGGCGCC TCCATCTGCA ACCACAGTTG 

5 IbDl TAAGGCTCAT GGCACCTCTG CTTGGAAAGC ACTGGTTTAG GGACTTAGAG 

lb51 AGGTAGGCAC AAGGTGGGTC TCCTGGGTAA GGGAAGCAAG AGCAGACTGT 

17D1 TGGGCCAACA GGAGAAGCTC CCCAGAGTAG GGGAGAAGGT TGGGGTGTAG 

1751 GGCCTTCCAC GTGGAACAGA CAGCCCCTGT GTCTCTGTCT CTTGGGGACC 

IflQl TGAGTTTGGG TGGGGTGGCA GTTGGCACAG CGCAGAT6C6 GTAGAGATGG 

10 1351 GAGGAAACCC AGCTCCTCAC TTCCGTGTGC CTCATGCCTT TGCATACACA 

1101 AGCACCAAAC CTACTAGGTC TTCTCATTAC CCATGTAAAC CACATGTTAG 

1151 ATAAATTTTT GCAAGTAGAG GAAAGAAGGA AATAA AACAT CACATTTTGG 

2001 TGTCTCTCAG GCTTTCCCCC CCAACTATGG TTTCTTTGCT TTTTGTTTTA 

2051 ACATAGTTTT GTTGCTGTCT TCTGTAATGA TACAGTTTTG TGCAGCTGTT 

15 2101 TTCACTTAGC ATATCGTGGG CATCTCCCCT TATGATTACT AAATATTTTA 

2151 TTTTGGAGTG GCTGTGTACT CTCCCATTGA CTAGATGGAC CATTGTGCCA 

B2D1 GTTGCCAATC ACTAATGCTG TTACTAACTT TTCAGTTATA AATTGATGAA 

2251 TATCTTTGTG CACAGGCTGT TTCCCAATGT CAAGTTATTA GGGTAGACTC 

2301 CAGGAGGTGG GATTCTTCAA CTAAAGAATA TGAAAACCTT TGAGGCTTTT 

20 2351 ACTACATATT GACAAAATGG TTTCCGGAAA TATTTGTATC CCCTTACACT 

2401 GCCACCAGCA AGGATAAACA TGTCCATCTT GCCCGTATTG GGAATTATCA 

2451 TCTGGCTAAA TATTTGCTAA TTTGATAATG AAAAAATAGC ATCGTGTTTC 

2501 AGTTGGCATT TCACTGACTT CTAGCACGGT TGAACATCTT TCATGTGGAG 

2551 CGATTGTATT TCCTCCTTTG TGGATTGTCA GTGTCCTTTG CTCTATCTTC 

25 2L.01 TGGGGTCAGA TAAATTTGTA TGAGCTCGGT ATATATTAAA GATATTAACC 

2LS1 TGGTGTGTGT CAAAAAAAAA AAAAAAAAAA AAAAAAAAAA 



BLAST Results 

30 

Entry HS1033E15 from database EMBL: 

Human DNA sequence from clone 1033E15 on chromosome 22ql3 ■ 1-13 • 2 • 
Contains part of a novel gene-i ESTs and a GSS- 
35 Score = Sin-, P = 5.1e-2b2-. identities = 1167/1115 

Entry HSN12AA12 from database EMBL: 

Human DNA sequence from cosmid N12SA12 on chromosome 22ql2-qter 
contains ESTsi CpG island- 
40 Score = 503A-. P = O-Oe+OOi identities = 1014/1011 

Entry HSb1034b from database EMBL : 
human STS UI-14034- 
Score = IflOO-. P = l-4e-7b-. identities = 312/417 

45 



Medline entries 



50 

No Medline entry 



55 Peptide information for frame 1 



ORF from lib bp to 631 bpi peptide length: 212 
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WO 01/98454 PCT/IB01/02050 

Category: putative protein 
Classification: no clue 

1 MVLFKRDRRE DTflflGHHSMN GRCTDHFLFV LSSLLSPAAI LVRLVPARER 
5 SI CPflVKGYSGT UEKAPGRFPC GPAflHGSRVG TLLCRflPSLY SSGFLRALPC 

101 LCflACAASHP TAAUERPATL PVHTLPVHTL PVHNCSRALC LUAPNPSSCS 
1S1 TFVblHGDLCF FSIdCLCVUAU DECUYALRTF LIAPCTLEHG ADERGSGACP 
201 PPUTUKKPTL ER 

10 

BLASTP hits 

Mo BLASTP hits available 

15 

Alert BLASTP hits for DKFZphamy2_121ol7-, frame 1 
No Alert BLASTP hits found 
20 Pedant information for DKFZphamy2_121ol7-i frame 1 



25 



30 



35 



Report for DKFZphamy2_121ol7 - 1 



[LENGTH! 212 

IMtiO 23727-55 

EpIJ fl. 73 

IEKIiO TRANSMEMBRANE 1 



SEfl MVLFKRDRR ED TflflGHHSMNGRCTDHFLFVLSSLLSP A AIL VRL VP ARER CPflVKGYSGT 

PRD ccchhhhhcccccccccccccccccchhhhhhhhccccceeeeecccccccccccccccc 

MEM MMMMMMMMMMMMMMMMM 

SEfl WEKAPGRFPCGPAflHGSRVGTLLCRflPSLYSSGFLRALPCLCflACAASHPTAAWERPATL 

PRD ccccccccccccccccccceeeeeccccccccccccccccchhhhhhccccccccccccc 

MEM 

40 SEfl PVHTLPVHTLPVHNCSRALCLUAPNPSSCSTFVUHGDLCFFSIilCLCVIJAIiJDECIiJYALRTF 

PRD ccccccccccccccccceeeeecccccccceeeecccceeecccceeeeccchhhhhhhe 



MEM 



SEfl LIAPCTLEHGADERGSGACPPPUTIilKKPTLER 

45 PRD eeeccccccccccccccccccccccccccccc 

MEM 

(No Prosite data available for DKFZphamy2_121ol7-l> 

50 

(No Pfam data available for DKFZphamy2_121ol7.1) 
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5 group: signal transduction 

DKFZphamyE_lEd7 encodes a novel SSS amino acid protein n which is 
a so far unknown alternative spliced form of disks large homolog 
DLG2. 

10 

It seems to be predominantly expressed in the retinai germ cells 
and brain. It contains a SH3-domain and a guanylate kinase 
domain- These conserved regions are shared among members of the 
discs-large family of proteins that include human pS5-, a membrane 

15 protein expressed in erythrocytes i rat PSD-TS/SAPTD-i a synapse 
protein expressed in brain-i Drosophila dlg-Ai a septate junction 
protein expressed in various epithelial and human and mouse Z0-1 
and canine Z0-2i two tight junction proteins. The Homologue of 
Drosophilai dlg-A-i acts as a tumor suppressor. All members of 

20 this family may be involved in signal transduction- 

The new protein can find application in modulating/blocking 
intracellular signal transduction pathways. 

25 

similarity to disks large homolog DLGE (Homo sapiens) 

alternative splicing: see DLGE 
complete cds- 

30 frame shift: around position 1137 one C too many 
Sequenced by EMBL 

Locus: /map= n 33fl-t cR from top of Chrl7 linkage group" 

35 

Insert length: HESQ bp 

Poly A stretch at pos- mflOi polyadenylation signal at pos. MlbS 



40 1 CCCGGCTGCG 

51 GAGACGTTTC 
101 ACTCTGAAAC 
151 AGTGCCACGG 
EDI GGAAAGTCCC 

45 ESI AGACGAAGCT 
301 ATCCTGCGGG 
351 GCTGGCCCAC 
M01 ACGACTCTGT 
M51 CTGGACCCTA 

50 501 GGTGGGCATC 
551 TGGAGGGCGG 
bOl GCTCAGCAAG 
bSl GCAGCCAGTG 
701 CCAGTGGCAG 

55 751 CTGCCCCGCC 
601 AGACAGCCTC 
fl51 ACTTGCTCCA 
101 CATGTCGAAG 



CTGGAGCCGC CCGGAGCTAG 
AGAGCCCTTG CCTCCTTCAC 
TGCCATGCAG CAAGTCCTGG 
GGGCTGCAGA GCTGGACCTG 
ATAGTAAGAT CCCTGGCCAA 
GGAGGCCGTG AGAGACAACA 
ACCTGGCGCA GCTGGCTGAG 
ATCCTCCAGG AGCCCCACTT 
GGCCTCAAAG ACCTATGAGA 
CGTTCAGCAA CCAGCCTGTA 
CGCAAGACAG CCGGAGAACA 
CGAGCTGGTG ATCGCGCGCA 
GCCTGCTGCA TGTGGGTGAC 
GGCAGTGACC CCCGCGCACT 
TGTCATCCTC AAGATCCTGC 
AGGTATTTGT GAAATGTCAC 
ATCCCCTGCA AGGAAGCAGG 
GATCGTAAAC CAGGATGATG 
GGGGCAGTGC TGGGCTCATT 



GGGCTTCCCG GGGCGCAGGA 
CATGCCGGTT GCCGCCACCA 
ACAACTTGGG ATCCCTCCCC 
ATCTTCCTTC GAGGCATTAT 
GGCCCATGAG AGGCTGGAGG 
ACCTGGAGCT GGTGCAGGAG 
CAGAGCAGCA CAGCCGCCGA 
CCAGTCCCTC CTGGAGACGC 
CACCACCCCC CAGCCCTGGC 
CCTCCCGATG CTGTGCGCAT 
TCTGGGTGTA ACGTTCCGCG 
TTCTGCATGG GGGCATGGTG 
ATCATCAAGG AGGTGAACGG 
GCAGGAGCTC CTGCGCAATG 
CCAGCTACCA GGAGCCCCAT 
TTTGACTATG ACCCGGCCCG 
CCTGCGCTTC AACGCCGGGG 
CCAACTGGTG GCAGGCATGC 
CCCAGCCAGC TGCTGGAGGA 
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151 GAAGCGGAAA GCATTTGTCA AGAGGGACCT GGAGCTGACA CCAAACTCAG 
1001 GGACCCTATG CGGCAGCCTT TCAGGAAAGA AAAAGAAGCG AATGATGTAT 
1051 TTGACCACCA AGAATGCAGA GTTTGACCGT CATGAGCTGC TCATTTATGA 
1101 GGAGGTGGCC CGCATGCCCC CGTTCCGCCG GAAAACCCTG GTACTGATTG 
5 1151 GGGCTCAGGG CGTGGGACGG CGCAGCCTGA AGAACAAGCT CATCATGTGG 
1201 GATCCAGATC GCTATGGCAC CACGGTGCCC TACACCTCCC GGCGGCCGAA 
1B51 AGACTCAGAG CGGGAAGGTC AGGGTTACAG CTTTGTGTCC CGTGGGGAGA 
1301 TGGAGGCTGA CGTCCGTGCT GGGCGCTACC TGGAGCATGG CGAATACGAG 
1351 GGCAACCTGT ATGGCACACG TATTGACTCC ATCCGGGGCG TGGTCGCTGC 
10 1401 TGGGAAGGTG TGCGTGCTGG ATGTCAACCC CCAGGCCGGT GAAGGTGCTA 
1451 CGAACGGCCG AGTTTGTCCC TTACGTGGTG TTCATCGAGG CCCCAGACTT 
1501 CGAGACCCTG CGGGCCATGA ACAGGGCTGC GCTGGAGAGT GGAATATCCA 
1551 CCAAGCAGCT CACGGAGGCG GACCTGAGAC GGACAGTGGA GGAGAGCAGC 
IbOl CGCATCCAGC GGGGCTACGG GCACTACTTT GACCTCTGCC TGGTCAATAG 
15 lb51 CAACCTGGAG AGGACCTTCC GCGAGCTCCA GACAGCCATG GAGAAGCTAC 
1701 GGACAGAGCC CCAGTGGGTG CCTGTCAGCT GGGTGTACTG AGCCTGTTCA 
1751 CCTGGTCCTT GGCTCACTCT GTGTTGAAAC CCAGAACCTG AATCCATCCC 
IflOl CCTCCTGACC TGTGACCCCC TGCCACAATC CTTAGCCCCC ATATCTGGCT 
lflSl GTCCTTGGGT AACAGCTCCC AGCAGGCCCT AAGTCTGGCT TCAGCACAGA 

20 1101 GGCGTGCACT GCCAGGGAGG TGGGCATTCA TGGGGTACCT TGTGCCCAG.G 
1151 TGCTGCCCAC TCCTGATGCC CATTGGTCAC CAGATATCTC TGAGGGCCAA 
5001 GCTATGCCCA GGAATGTGTC AGAGTCACCT CCATAATGGT CAGTACAGAG 
2051 AAGAGAAAAG CTGCTTTGGG ACCACATGGT CAGTAGGCAC ACTGCCCCTG 
2101 CCACCCCTCC CCAGTCACCA GTTCTCCTCT GGACTGGCCA CACCCACCCC 

25 2151 ATTCCTGGAC TCCTCCCACC TCTCACCCCT GTGTCGGAGG AACAGGCCTT 
2201 GGGCTGTTTC CGTGTGACCA GGGGAATGTG TGGCCCGCTG GCAGCCAGGC 
2251 AGGCCCGGGT GGTGGTGCCA GCCTGGTGCC ATCTTGAAGG CTGGAGGAGT 
2301 CAGAGTGAGA GCCAGTGGCC ACAGCTGCAG AGCACTGCAG CTCCCAGCTC 
2351 CTTTGGAAAG GGACAGGGTC GCAGGGCAGA TGCTGCTCGG TCCTTCCCTC 

30 2401 ATCCACAGCT TCTCACTGCC GAAGTTTCTC CAGATTTCTC CAATGTGTCC 
2451 TGACAGGTCA GCCCTGCTCC CCACAGGGCC AGGCTGGCAG GGGCCATTGG 
2501 GCTCAGCCCA GGTAGGGGCA GGATGGAGGG CTGAGCCCTG TGACAACCTG 
2551 CTGTTACCAA CTGAAGAGCC CCAAGCTCTC CATGGCCCAC AGCAGGCACA 
2t01 GGTCTGAGCT CTATGTCCTT GACCTTGGTC CATTTGGTTT TCTGTCTAGC 

35 2LS1 CAGGTCCAGG TAGCCCACTT GCATCAGGGC TGCTGGGTTG GAGGGGCTAA 
2701 GGAGGAGTGC AGAGGGGACC TTGGGAGCCT GGGCTTGAAG GACAGTTGCC 
2751 CTCCAGGAGG TTCCTCACAC ACAACTCCAG AGGCGCCATT TACACTGTAG 
2S01 TCTGTACAAC CTGTG6TTCC ACGTGCATGT TCGGCACCTG TCTGTGCCTC 
2SS1 TGGCACCAGG TTGTGTGTGT GTGCGTGTGC ACGTGCGTGT GTGTGTGTGT 

40 2101 GTGTCAGGTT TAGTTTGGGG AGGAAGCAAA GGGTTTTGTT TTGGAGGTCA 
2151 CTCTTTGGGG CCCCTTTCT6 GGGGTTCCCC ATCAGCCCTC ATTTCTTATA 
3001 ATACCCTGAT CCCAGACTCC AAAGCCCTGG TCCTTTCCTG ATGTCTCCTC 
3051 CCTTGTCTTA TTGTCCCCCT ACCCTAAATG CCCCCCTGCC ATAACTTGGG 
3101 GAGGGCAGTT TTGTAAAATA GGAGACTCCC TTTAAGAAAG AATGCTGTCC 

45 3151 TAGATGTACT TGGGCATCTC ATCCTTCATT ATTCTCTGCA TTCCTTCCGG 
3201 GGGGAGCCTG TCCTCAGAGG GGACAACCTG TGACACCCTG AGTCCAAACC 
3251 CTTGTGCCTC CCAGTTCTTC CAAGTGTCTA ACTAGTCTTC GCTGCAGCGT 
3301 CAGCCAAAGC TGGCCCCTGA ACCACTGTGT GCCCATTTCC TAGGGAAGGG 
3351 GAAGGAGAAT AAACAGAATA TTTATTACAA ATGTTAGAAT ATATTTCTTA 

50 3401 TACTAGGAAT CTCATTTGCA TTTGCATAGA CTATACACAT GGGGTGGAAA 
3451 GGCCAGGCCT GCCCCCATCT CGTTGGTGTG GCTCTGCGTA TACTACACAC 
3501 TCATTCTCCT GCTCCTCTTT TCCCTTAGTC AGTGTCCTTT CATCCTGATT 
3551 CAGCTCTGCC TTGCATCACC CTCAGCCTAA GGGAGTGGGA AGGAAATGGG 
3b01 GTGTTTTCTT GCTGACCTGA GGCTATAGGG TCACTTGCCA TTTCCTACCT 

55 3h51 TCTCTGGGGG ATTTGAGGGT AGAGGCAGGG GAAGATCTGT TGTTGCAGTT 
3701 GCTTCTGCCC CCTTGATCCA AATGACCATC ATCTCTGATG GAGATGGGTT 
3751 GGGTACCTGG CCTTCATGGC ACCTTCACTG CTAGGGATGC TCAAGGGGCA 
3S01 GGCCTGGGGC CCTTCCCTCC TGTCTCTTCT CGGTCTTTCC TCTCTGAGCA 
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WO 01/98454 



PCT/IB01/02050 



10 



15 



30 



35 



40 



3651 6CCTCCTACC TCCCCTGCCT GAGCCCTCAC TCCACAGCCC TCCCAGGTAC 

3101 CTAGCAGAGG CTGTCAGTCC TTGGCTCACC TGGAACAGGG CTGGGGCTGG 

3151 GTT6GAACAG GTGTGTGCCC CCACCACA6C TCTATGACTC TGTTCTCCCT 

MODI CCCTGCCATT GTGGACTCTT GTATTTGAGG GACCTCAAGA GAGTGAGGAC 

<4051 CCTACCATCC ACTGTCCATA TTCAGTCCCA GCCCCAGTGC GCTTCCTCTG 

4101 TTCCCTCCCT CAGCCATCCA ATTCTTGAGT TTTCTCACTG ATTGGTTTTC 

H151 TTTCTTTTTC CTTGGATTAA ATGTGAAAGC AAAGAAAAAA AAAAAAAAAA 

4201 AAAAAAAAAA AAAAAAAAAA 



BLAST Results 



No BLAST result 

Medline entries 



20 1fc070426: 

flazoyer Si Gayther SAi Nagai NAi Smith SAi Dunning Ai van 
Rensburg EJ-i 
Albiertsen Hi White Ri 

Ponder BA-n A gene (DLG2) located at 17ql2-q21 encodes a new 
25 homologue 
of 

the Drosophila tumor suppressor dlg-A- Genomics 1115 Jul 
lnBfi(l) :E5-31 



Peptide information for frame 1 



ORF from 62 bp to 1437 bpi peptide length: 452 
Category: strong similarity to known protein 
Classification: Cell signaling/communication 
Prosite motifs: GUANYLATE__KINASE_1 (365-402) 



1 PIPVAATNSET ANfidVLDNLG SLPSATGAAE LDLIFLRGIH ESPIVRSLAK 

51 AHERLEETKL EAVRDNNLEL V(2EILRDLA(3 LAEUSSTAAE LAHILflEPHF 

1D1 (2SLLETHDSV ASKTYETPPP SPGLDPTFSN flPVPPDAVRM VGIRKTAGEH 

45 151 LGVTFRVEGG ELVIARILHG GMVAflflGLLH VGDIIKEVNG CPVGSDPRAL 

2D1 (2ELLRNASGS VILKILPSYfl EPHLPRflVFV KCHFDYDPAR DSLIPCKEAG 

251 LRFNAGDLLfJ IVNdDDANWU (3ACHVEGGSA GLIPSfiLLEE KRKAFVKRDL 

3D1 ELTPNSGTLC GSLSGKKKKR MHYLTTKNAE FDRHELLIYE EVARHPPFRR 

351 KTLVLIGAflG VGRRSLKNJCL IMUDPDRYGT TVPYTSRRPK DSEREGflGYS 

50 4D1 FVSRGEMEAD VRAGRYLEHG EYEGNLYGTR IDSIRGVVAA GKVCVLDVNP 
MSI <2A 



55 BLASTP hits 

No BLASTP hits available 
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WO 01/98454 PCT/IB01/02050 
Alert BLASTP hits for DKFZphamy2_12d7-. f raroe i 

No Alert BLASTP hits found 



Peptide information for frame 2 



ORF from H431 bp to 173B bp; peptide length: 100 
10 Category: strong similarity to known protein 
Classification: Cell signaling/communication 
Prosite motifs: LEUCINE_ZIPPER <bb-fl?) 



15 1 VKVLRTAEFV PYVVFIEAPD FETLRAMNRA ALESGISTKfl LTEADLRRTV 

51 EESSRItJRGY GHYFDLCLVN SNLERTFREL (2TAMEKLRTE PflWVPVSUVY 



20 BLASTP hits 

No BLASTP hits available 

Alert BLASTP hits for DKFZphamy2_12d7-> frame 2 
No Alert BLASTP hits found 

Pedant information for DKFZphamy2_12d7i frame 1 



25 



30 



Report for ' DKFZphamy2_12d?.l 



EH • influenzae! 



ELENGTHJ Sib 

35 Ell U 3 SbMSfl.3b 

EpIJ b-21 

EHOPIOLl PIR:AS7bS3 disks large homolog DLG2 - human □-□ 

EFUNCATJ 01.03.11 other nucleotide-metabolism activities ES. 
cerevisiae-, Y»R1Si»c3 7e-15 

40 EFUNCAT3 f nucleotide metabolism and transport 
HI17M33 3e-07 

EBLOCKSJ PR00fl3»4F 

EBLOCKSJ BLOOfiSbC 

EBLOCKSJ BLOOfiSbB Guanylate kinase proteins 

45 EBLOCKSJ BLOOfiSbA Guanylate kinase proteins 

ESCOPJ dlgky 3-21.1. 1-1 Guanylate kinase Ebaker's 

yeast (Saccharomyce fle- l !5 

ESCOPJ dlkuab_ 2. 2b. 1.1. 2 Cask/Lin-2 

sapiens) Me-3M 

50 EEC! 2.7-M.fl Guanylate kinase fle-17 

EPIRKtO blocked amino end fle-17 

EPIRKIO phosphotransferase fle-1? 

EPIRKIO monomer fle-17 

EPIRKIO duplication Se-21 

55 EPIRKIO signal transduction 3e-2i» 

EPIRKliU alternative splicing Se-21 

EPIRKIO P-loop fle-17 

EPIRKtO acetylated amino end le-lb 

-82- 



EHuman (Homo 



WO 01/98454 PCT/IB01/02050 

EPIRKliD membrane protein Te-7M 

EPIRKU3 magnesium 6e-17 

EPIRKUJ ATP fle-17 

ESUPFAIU SH3 homology E ^e-7^ 

5 ESUPFAril discs-large tumor suppressor 36-24 

ESUPFAIU unassigned Ser/Thr or Tyr-specific protein kinases Se- 
ll 

ESUPFAMJ protein kinase homology Se-11 

ESUPFAfO GLGF domain homology le-74 

10 ESUPFAI1J guanylate kinase Se-17 

ESUPFAMJ guanylate kinase homology Te-71 

EPR0SITE3 GUANYLATE_KINASE_1 1 

EPFAfU Src homology domain 3 

EKWJ Irregular 

15 IKW1 3D 



20 



25 



SEC MPVAATNSETAIKJflVLDNLSSLPSATGAAELDLIFLRGiriESPIVRSLAKAHERLEETKL 
Igky- 

SEfl EAVR»NNLELV(3EILRI>LA(2LAE(3SSTAAELAHIL(2EPHF(JSLLETHDSVASKTYETPPP 
Igky- 

SE<3 SPGLDPTFSNl2PVPPDAVRI1VGIRKTAGEHLGVTFRVEGGELVIARILHGG[1VA<2<2GLLH 
Igky- 



30 SEC VGDIIKEVNG(2PVGSDPRAL<2ELLRNASGSVILKILPSY<3EPHLPR<2VFVKCHFI>YDPAR 
Igky- 

SEfl DSLIPCKEAGLRFNAGDLL(2IVNl3DDANli)lil(JACHVEGGSAGLIPS(3LLEEKRKAFVKR]>L 
35 Igky- 

SE<2 ELTPNSGTLCGSLSGKKKKRIU1YLTT<NAEFDRHELLIYEEVARHPPFRRKTLVLIGAt3G 
lgky- 

40 CCEEEECTTT 

SE<2 VGRRSLKNKLIMWDPDRYGTTVPYTSRRPOSEREGfiGYSFVSRGEflEADVRAGRYLEHG 
Igky- 

TCHHHHHHHHHHHTTTTEEECCEEECCCCTTTTTTTTTTEECCHHHHHHHHHHCCEEEEE 

45 

SEfl EYEGNLYGTRIDSIRGVVAAGKVCVLDVNPflAGEGATNGRVCPLRGVHRGPRLRDPAGHE 
Igky- 

EETTEEEEEEHHHHHHHHHHCCEEEEECCHH 

50 SE<2 (JGCAGElilNIHiJAAHGGGPETDSGGEflPHPAGLRALL 

igky- 



55 Prosite for DKFZphamy2_15d7. 1 

PSDDflSt. 3fl5->4Q3 GUANYLATE_KINASE_1 PD0CD0b7D 
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PCT/IB01/02050 



Pfam for DKFZphamy2_12d7 • 1 

5 

HNM_NAriE Src homology domain 3 
Will 

*pyVIALYDYqAqd pDELSFkEGDIIillEdsDD • UlUrgRnnn 

10 +V+ +DY++ + + L F GD ++I++++D+ lilU + 

fiuery 22fi 

VFVKCHFDYDPARDSLIPCKEAGLRFNAGDLL<nVNtfDDANIdUflACHv"E 27b 

Hnn TNGflEGWIPSNYVEPi* 
15 ++ G+IPS +E+ 

fluery 27? GG-SAGLIPS<2LLEEK 211 



20 Pedant information for DKFZphamy2_12d7i frame 2 

Report for DKFZphamy2_12d7 • 2 



25 



40 



50 



55 



ELENGTHJ 175 
EMUJ 11721. ID 



EpIJ Lbl 

CHOriOLU PIR:A57b53 disks large homolog DLG2 - human 7e-S3 

30 CPIRKliU membrane protein le-13 

ESUPFAfD SH3 homology le-13 

ESUPFAM3 GLGF domain homology le-13 

ESUPFAPO guanylate kinase homology le-13 

EPR0SITE3 LEUCINE_ZIPPER 1 

35 EKbO Alpha_Beta 



SE(3 riAPRCPTPPGGRKTflSGKVRVTALCPVGRURLTSVLGATIilSriANTRATCriAHVLTPSGAU 

PRD ccccccccccccccccceeeeeeeccccccceeeeeccccccchhhhhhhhhhccccccc 

SE<2 SLLGRCACUHSTPRPVKVLRTAEFVPYVVFIEAPDFETLRAIINRAALESGISTKiSLTEAD 

PRD ccccceeeeecccchhhhhhhhhcceeeeeeeccchhhhhhhhhhhhhccccchhhhhhh 



SEfl LRRTVEESSRIflRGYGHYFDLCLVNSNLERTFRELdTAMEKLRTEPdUVPVSWVY 
45 PRD hhhhhhhhhhhhhhhhhheeeeeecccchhhhhhhhhhhhhhhhccccccccccc 



Prosite for DKFZphamy2_12d7.2 
PSDD021 !Ml->lb3 LEUCINE_ZIPPER PD0C00021 

(No Pfam data available for DKFZphamy2_12d7-2) 
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WO 01/98454 
DKFZphamy2_12g? 



PCT/IB01/02050 



group: amygdala derived 

DKFZphamy2_12g? encodes a novel 254 amino acid protein without 
similarity to known proteins- 
No informative BLAST resultsi No predictive prosite-i pfam or SCOP 
motif e • 

The new protein can find application in studying the expression 
profile of amygdala-specific genes- 



putative protein 
Sequenced by EHBL 
Locus: unknown 
Insert length: 12S7 bp 

No poly A stretch foundi no polyadenylation signal found 



1 CTCCAAGACT TCCTTGCTGT 
SI GGCTGTTTAC TCCACAGAGT 
1D1 CGAAGGCCCT GTGGGAATGA 
1S1 TCTGCCCCCA GGAAAGCAGC 
2D1 CTGCCCTGGA CTCCGTCGTC 
251 TCCCGGAAGG GCAGCGCGCT 
301 GACGGGGCTC TTCGAGCTAA 
351 CCGCCAGCGT GTCCCACCCT 
401 AGCAGCCCCA GAAGCCCTGC 
MSI TAGCCTGGGC CGCTCCCAGT 
SD1 ACCTCATGAG GTCGGGCAGT 
551 TGGCCTCTCC TGTTGGCCGC 
bOl TTGTCTCTGG ACAAGATTGC 
fc.51 ACCCGTCAAG TAGCACCGTG 
701 CCCAACGCCC CCAGAGGGTA 
7S1 ATGCCTCCCA GACGGGGGTG 
601 TCTGCTGATG AGGGATGGGG 
fiSl GGGACTAAGC CACCAGTATT 
101 ACCCCTAGGC CAGGGCAAGG 
1S1 TACCCTGGTT CTGAGTTTAC 
1DD1 TTCCCACCTC TGACATTCCC 
1QS1 GTGGTGATGG CTAAGGGCCC 
1101 GGGACAGGGC CAGGTAGCCT 
1151 AAGCACTTTA ATTTTTTTTT 
1201 AACTATTGCT TCCAACT6AA 
1251 AAAAAAA 



GAGGCTCGTG TGGACCCCAG AGCATGCACA 
GGCTTTGAGA ATCAGATGAG ACTGTGCTGG 
GGAACGCTGT AGTGTTTGCT GGTCCCTGTT 
TGTGTGAGGA GGAGCGCCGG GCCATGCAGG 
TGCCACACGC CCCTCAACAA CCTTGGCTTT 
CACCTTCAGT GTGGCCTTCC AGGCTCTGAG 
GCCAGCACAT GAAACTGAAG CTGCAGTTCA 
CCACCCGAGG CCCGGCCCCT CTCCCGCAAG 
TGTCCGGGAC TTGGTGGAGA GGCATCAGGC 
CCTTCTCCCA CCAGCAGCCT TCCCGAAGCC 
GTGATGGAGC GCAGAGCATC ACGCCCCCTG 
CCCCTCTACC TGCCCCCGGA CAAGGCTGTG 
CAAGCGCGAG TGCAAGGTCC TGGTGGTGGA 
CCAGCTCTGT TCCCTCTTAC ACTCCAGAGA 
TCCTT6CTCC CGGGCTGTGC CTCCCCTGGG 
AAGAGGCCTG GCAGAGCTGC CTGTCTTGTG 
GAAGAAGCTG TGAAGTGGGC GGGCATGGCT 
CCCCGACGTT CCTGTGGGGG GGGCTGGCCC 
GTTCCCAGAG CTCCCTTGTC CCCGGCCCTT 
AAAGTCTCTT CCTCATTCCC GTTGAGTTCT 
TCCCTCCCTC CCGCAGGCTG AGATTAGAGG 
CTGACAGTGA CCTTCCTGTC TCAGG6GTTG 
CCTGCCCCTT ATGTTTACGT TTGCAGCCTG 
TTTTTGGTCT GTCCCTGTAA CTAATTTTCC 
ATAAGACTAT TAAATGCCTG TTCAGAGGGA 



BLAST Results 



No BLAST result 
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WO 01/98454 



PCT/IB01/02050 



Medline entries 



5 



35 



No Medline entry 



10 Peptide information for frame S 



ORF from bp to S05 bpi peptide length: ESM 
Category: putative protein 
15 Classification: no clue 

1 MHRLFTPflSG FENflMRLCUR RPCGNEERCS VCUSLFLPPG KfiLCEEERRA 

51 MfiAALDSVVC HTPLNNLGFS RKGSALTFSV AFflALRTGLF ELSflHMKLKL 

101 (2FTASVSHPP PEARPLSRKS SPRSPAVRDL VERHflASLGR S<3SFSHfl<2PS 

20 151 RSHLMRSGSV IIERRASRPLIil PLLLAAPSTC PRTRLCCLUT RLPSASARSW 

201 UUNPSSSTVP ALFPLTLfiRP NAPRGYPCSR AVPPLGCLPD GGEEAlilflSCL 

251 SCVC 



25 

BLASTP hits 

No BLASTP hits available 
30 Alert BLASTP hits for DKFZphamy2_12g7 frame 2 

No Alert BLASTP hits found 

Pedant information for DKFZphamy2_12g7-i frame 2 



Report for DKFZphamy2_12g7 .2 



40 CLENGTHJ EBH 

CMIill E&H7^ • Tl 
EpIJ 

CBL0CKS1 BL01013C Oxysterol-binding protein family proteins 

EKWJ Alpha_Beta 

45 IKW1 L0U_C0MPLEXITY M-72 '/. 



SE<2 MHRLFTPiJSGFENiStlRLCURRPCGNEERCSVCUSLFLPPGKflLCEEERRAMfiAALDSVVC 

SEG • 

50 PRD ccccccccccccchhhhhhcccccccceeeeeeeeeccccccchhhhhhhhhhhhhheee 

SE(2 HTPLNNLGFSRKGSALTFSVAF(2ALRTGLFELS<2Hf1KLKLC!FTASVSHPPPEARPLSRKS 

SEG ••xxxxxxx 

PRD cccccccccccccceeeehhhhhhhhhhhhhhhhhhhhhhhhhhhccccccccccccccc 

55 

SEfl SPRSPAVRDLVERH(3ASLGRS(3SFSH(2l2PSRSHLMRSGSVI1ERRASRPLUPLLLAAPSTC 

SEG xxxxx 

PRD ccccchhhhhhhhhhhhcccccccccccccceeeecccchhhhhhccccccccccccccc 
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WO 01/98454 PCT/IB01/02050 

SEC? PRTRLCCLWTRLPSASARSWUWNPSSSTVPALFPLTLCSRPNAPRGYPCSRAVPPLGCLPD 

SEG 

PRI> cccceeeeeccccccccceeeccccccccccccccccccccccccccccccccccccccc 

SE(3 GGEEAUflSCLSCVC 

SEG 

PRD cchhhhhhhhhccc 



(No Prosite data available for DKFZphamy2_12g7 • 2) 
(No Pfam data available for DKFZphamy2_12g7. 2) 
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WO 01/98454 
DKFZphamy2_12il 



PCT/IB01/02050 



5 group: amygdala derived 

DKFZphamy2_12il encodes a novel 263 amino acid protein with weak 
similarity to FmEfc>.3 of Caenorhabditis elegans. 

10 No informative BLAST results^ No predictive prositei pfam or SCOP 
motife- 



The new protein can find application in studying the expression 
profile of amygdala-specific genes. 

15 



putative protein 
Sequenced by EMBL 

20 

Locus: /map="3 n 
Insert length: 252fi bp 

Poly A stretch at pos. 2515i polyadenylation signal at pos- 2M11 

25 



1 ATATAGTTGG ATCAAACAAA AACAACACAA TTTGTCCCGA TAATTATCAA 
51 ACAGCACAGC TACTTGCCTT AATTTTAGAG TTACTCACAT TTTGTGTGGA 
101 ACATCACACA TATCACATAA AAAACTATAT TATGAACAAG GACTTGCTAA 
30 1S1 GAAGAGTCTT GGTCTTGATG AATTCAAAGC ACACTTTTCT GGCCTTGTGT 
2D1 GCCCTTCGCT TTATGAGGCG GATAATTGGA CTTAAAGATG AATTTTATAA 
251 TCGTTACATC ACCAAGGGAA ATCTTTTTGA GCCAGTTATA AATGCACTTC 
301 TGGATAATGG AACTCGGTAT AATCTGTTGA ATTCAGCTGT TATTGAGTTG 
351 TTTGAATTTA TAAGAGTGGA AGATATCAAG TCTCTTACTG CCCATATAGT 
35 M01 TGAAAACTTT TATAAAGCAC TTGAATCGAT TGAATATGTT CA6ACATTCA 
M51 AAGGATTGAA GACTAAATAT GAGCAAGAAA AAGACAGACA AAATCAGAAA 
5D1 CTGAACAGTG TACCATCTAT ATTGCGTAGT AACAGATTTC GCAGAGATGC 
551 AAAAGCCTTG GAAGAGGATG AAGAAATGTG GTTTAATGAA GATGAAGAAG 
bOl AGGAAGGAAA AGCAGTTGTG GCACCAGTGG AAAAACCTAA GCCAGAAGAT 
40 b51 GATTTTCCAG ATAATTATGA AAAGTTTATG GAGACTAAAA AAGCAAAAGA 
701 AAGTGAAGAC AAGG'AAAACC TTCCCAAAAG GACATCTCCT GGTGGCTTCA 
751 AATTTACTTT CTCCCACTCT GCCAGTGCTG CTAATGGAAC AAACAGTAAA 
SOI TCTGTAGTGG CTCAGATACC ACCAGCAACT TCTAATGGAT CCTCTTCCAA 
B51 AACCACAAAC TTGCCTACGT CAGTAACAGC CACCAAGGGA AGTTTGGTTG 
45 =101 GCTTAGTGGA TTATCCAGAT GATGAAGAGG AAGATGAAGA AGAAGAATCG 
151 TCCCCCAGGA AAAGACCTCG TCTTGGCTCA TAAAATATTT ATTAGGGGAC 
1D01 CCTCAACATG TGGTCTTACA ATGCTGCAAC TGTTCAGTGA GCTGAAAATC 
1051 TGAATCAGAA AGCTTTCTCA ATTGAACTTA TAAAATATAC AAGGAGTAGC 
1101 AAAAGACAGT ATATCAGCTA AGAGAGTTTA GTTCTAATAA AAATCAGGCT 
50 1151 TCCCAGGAAC TTGATTGCTT GCTAGTAATT AAGGGGTTTG CCTTTTAGGC 
1201 TGTCAAAACA AACATTAGTA ACCAGAACCT GGGAGATAGC TTCTCAGCAA 
1251 GGAAAAGTCA CAGGTTTGGG GACGGTTTAG GGGAGGGGAA AAGGTTGATA 
1301 TAATAATGCA GGGTTGCTCC TCGGGGTGTC GATCTAGAAA CAATTTTACA 
1351 GAACTTCAGT TGTAAACTCA ATAACATTAC TTGTATAATG GTGCTGGCCA 
55 1M01 TGTTGTTGTT TTAATCAGTT GCCTCTTTTT AAAAGAAATT TTTATGGAAA 
mSl ACACATTCAA CTATCATTAA AAAAATGAAG TTAAGCTGTT GGGACCATTT 
1501 CTTTAAGATT TAACAAAAGT TCAGCCTTTT AGGTAGTTGA AGGGAAGTAC 
1551 ACCCCGTATT CAGCACATGT TGAGTTTTCT ACACCAGGAA TTTTCAATAT 
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WO 01/98454 



PCT/IB01/02050 



IbOl GTATATTGAT GAAAACAAGC TCAATTCAAA CTGGACAGTT TTAAGATAAT 

lbSl GTTAAAATCA GCACTTTTAG AGACAACGAA GGCCAAGAAT CAGTACAGTA 

17D1 GTATTCCAAA ATGATTTTCT CTAGAAATTT GAAAGTAGAT CGAACAGAAT 

1751 GTTGTCAACC GCCTACCAGT ACAATCTTTT GTGGAAGATA CTTTGAAATC 

1601 ACTTTCTACT TTGTTAGTAA AGTTCTGTCT TTCCAGAGCT GCAAGTTTTA 

1651 AAGTGTTACT TATACAGACC AACCAAGAAT AGTGCTGAAT TAAGTGGCAT 

11D1 TTAGTATCTA GAAGCCATTT TGATCCAAGA AGCTACTTAA GTGTCAAAGT 

1=151 CAGCATGCAG CACATGTAGC TTTTCTGTAA ACAAGGGTGT GATATGAAAG 

2001 CTGCTTTTTT AAGAAGAGTA AAAGCACATT CCATATACGT AAGTGA ATTT 

SD51 TAAAAATAAA TTGAGGCAAA CAGTTAAGTT TTATTTTTAG AGCAACAAGT 

2101 TAACTGTAAA TATTTTAATG TTAGTTTGCT CATCTATGAT CTGAGATCAT 

2151 GCCGAAGTGA GAAAAATCTC CCCAAAATAC AATTTAATGC ATTGGGAAAA 

2201 AAAAACTTTA ACAGTAATTC CAGCCACAAT CTTTAGATCA CCCTTGTAAT 

2251 GTGTTACGGG TCCATTTTTC CTGGAATCGT TTAATCTAAA GCAGTTTCCC 

2301 CTGTTTTGGA GATTTTGTAG TTAATTTTAA TTTTGGCTAT TGTTTGGAAA 

2351 AGATGAGCTG TCTGTGTAGA TATGAAGTAT AGTTTTTTCC ATAAAACAGA 

2MD1 TGTTTATTTT GTATTAAAAA ATACCACTGT ACTTGTTTTA CACCATTTGT 

2151 ATACATGTGG TGATATTAAT GCTAAACTGT AAAATTCAGG AATTAAAATG 
2501 TGACCCTGTA ATTCCAAAAA AAAAAAAA 



BLAST Results 



Entry AF01bma_fl from database TREMBL : 

gene: n F41Eb.3 n i Caenorhabdi tis elegans cosmid FMlEb . 

Score = 310-1 P = 5-Qe-32i identities = 73/lflli positives = 

llfl/lfiMi 

frame +3 

Entry HS21125fc, from database EMBL : 
human STS SHGC-15flM4- 
Score = 177i P = 5.5e-3fl-. identities = m/202 



iledline entries 



No Medline entry 



Peptide information for frame 3 



ORF from 132 bp to TAD bp} peptide length: 5fl3 
Category: putative protein 
Classification: no clue 



1 ilNKDLLRRVL VLNNSKHTFL 
51 PVINALLDNG TRYNLLNSAV 
1D1 EYVfiTFKGLK TKYEUEKDRfl 
151 FNEDEEEEGK AVVAPVEKPK 
201 TSPGGFKFTF SHSASAANGT 
251 TKGSLVGLVD YPDDEEEDEE 



ALCALRFF1RR IIGLKDEFYN RYITKGNLFE 
IELFEFIRVE DIKSLTAHIV ENFYKALESI 
NflKLNSVPSI LRSNRFRRDA KALEEDEEdbJ 
PEDDFPDNYE KFHETKKAKE SEDKENLPKR 
NSKSVVAtJIP PATSNGSSSK TTNLPTSVTA 
EESSPRKRPR LGS 
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WO 01/98454 PCT/IB01/02050 

BLASTP hits 

No BLASTP hits available 

Alert BLASTP hits for DKFZphamy2_12ili frame 3 
No Alert BLASTP hits found 
10 Pedant information for DKFZphamy2_12ili frame 3 



5 



15 



40 



Report for DKFZphamy2_12il.3 



ELENGTHJ 32b 
EMtiO 372bl.lD 



Cpll S-bO 

CHOMOLI TREHBL : AFDlb i ma_a gene: "F41Eb • 3 ns i Caenorhabditis 

20 elegans cosmid FMlEb • le-3b 

EFUNCAT3 01.Q5.Q4 regulation of carbohydrate utilization IS. 

cerevisiae-. YNL201cJ 2e-0fl 

EBL0CKS1 BLDD3S7 Histone H2B proteins 

EBL0CKS1 BPD2232B 

25 EBL0CKS1 PR01D73C 

EBL0CKS1 BPD3QSDC 

CBL0CKS3 BPa3SSDF 

EBLOCKSH PRQ0ai3F 

EKtO All_Alpha 

30 EKIiD L0U_C0MPLEXITY 10-43 V. 

SEfl IVGSNKNNTICPDNYflTAflLLALILELLTFCVEHHTYHIKNYIIINKDLLRRVLVLIINSKH 

SEG xxxxxxxxx 

35 PRD cccccccccccccchhhhhhhhhhhhhhhhhhcccchhhhhhhhhhhhhhhhhhhhccch 

SEC TFLALCALRFHRRIIGLKDEFYNRYITKGNLFEPVINALLDNGTRYNLLNSAVIELFEFI 

SEG 

PRD hhhhhhhhhhhhhhhhccchhhhhccccccchhhhhhhhhcccccccccchhhhhhhhhh 



SEfl RVEDIKSLTAHIVENFYKALESIEYVflTFKGLKTKYEflEKDRflNflKLNSVPSILRSNRFR 

SEG 

PRD hheeehhhhhhhhhhhhhhhhhhhhhhhhccchhhhhhhhhcccccccccccccccchhh 



45 SEfl RDAKALEEDEEflWFNEDEEEEGKAVVAPVEKPKPEDDFPDNYEKFIIETKKAKESEDKENL 

SEG xxxxxxxxxxxxxxx 

PRD hhhhhhhhhhhhhhhccccccceeeeeeeccccccccccccchhhhhhhhhhhhcccccc 

SEfl PKRTSPGGFKFTFSHSASAANGTNSKSVVAfllPPATSNGSSSKTTNLPTSVTATKGSLVG 

50 SEG 

PRD ccccccccceeeeccccccccccccceeeeecccccccccccccccccccccccccccee 

SEfl LVDYPDDEEEDEEEESSPRKRPRLGS 

SEG xxxxxxxxxx 

55 PRD eeccccccchhhhhcccccccccccc 

(No Prosite data available for DKFZphamy2_12il- 3) 
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WO 01/98454 PCT/1B01/02050 
(No Pfam data available for DKFZphamy2_lBil . 3 ) 
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WO 01/98454 
DKFZphamya_13gn 



PCT/1B01/02050 



5 group: amygdala derived 

DKFZphamy2_13gl c l encodes a novel 2fll amino acid protein without 
similarity to known proteins- 

10 The novel protein contains a PROSITE ASP_PR0TEASE motif and seem 
to be expressed 
Ubiquitously- 

No informative BLAST resultsi No predictive prositei pfam or SCOP 
motif e . 

15 

The new protein can find application in studying the expression 
profile of amygdala-specific genes. 



20 unknown protein 

perhaps complete cds- 
Pedant: SIGNAL_PEPTT.DE 

25 Sequenced by EMBL 

Locus: /chromosome="12pl3-3 n 

Insert length: 5751 bp 
30 Poly A stretch at pos. 2743i polyadenylation signal at pos- 2724 



1 GCAATCTCGG GAAATTGGAG ACTGACGCGG CTGCTCCTGC ATGTTATTTA 

51 TTTTTCCTCT TTCCCTCCCG TGGAGACCCT CCTGTTGGAA AGAGAGCTGC 

35 101 AGCACGGGAC AGAGACAGGC AGGAAGAAGC AGAGAGGACT CGGTGACGCC 

151 CCCACCGAGC AGCCCCTGGC CCACTCCTCC AGCAGGGGCC ATGAGCACCA 

201 AGCAGGAGGC CAGGAGAGAT GAGGGAGAAG CCAGGACGAG GGGGCAGGAG 

251 GCACAGCTTC GAGACCGAGC CCACCTGAGC CAGCAGCGCC GGCTCAAACA 

301 GGCCACCCAG TTCCTGCACA AGGACTCGGC CGACCTGCTC CCGCTGGACA 

40 351 GCCTCAAGAG GCTCGGCACC TCCAAGGACT TGCAGCCGCG CAGTGTGATC 

401 CAAAGACGCC TGGTGGAGGG AAACCCGAAT TGGCTTCAGG GGGAGCCTCC 

451 CCGGATGCAG GACCTGATTC ATGGCCAGGA GAGCAGGAGG AAGACCAGCA 

5D1 GGACAGAGAT TCCAGCTCTT CTGGTCAACT GCAAGTGCCA GGACCAGCTG 

551 CTTAGAGTGG CCGTTGACAC AGGCACCCAA TACAATCGGA TCTCTGCTGG 

45 bOl ATGTCTCAGC CGCCTGGGGT TAGAGAAAAG GGTCCTAAAA GCCTCAGCTG 

b51 GGGACCTGGC CCCTGGGCCC CCAACCCAGG TGGAGCAGTT GGAGCTACAG 

701 CTGGGGCAGG AGACTGTGGT GTGCTCGGCA CAGGTGGTGG ATGCTGAGAG 

751 TCCTGAATTC TGCCTGGGCC TGCAGACTCT GCTTTCTCTC AAGTGCTGCA 

501 TCGACCTGGA GCACGGAGTG CTGCGGCTGA AAGCCCCGTT CTCAGAGCTA 

50 A51 CCCTTCCTGC CTTTGTACCA AGAGCCTGGC CAGTGACTGC TGTCTCAGTC 

101 AGTCCCCAGA GGGAAAGACC TTGCCTTAGA AGAAGAGGCG TGTGGGGAAC 

151 GGGGGCTCTT GAAGCCAGGT AGCTGGGGAC TATGGTGTCT GCCCTTCCAA 

1001 TCACCTCCCT GACCCCTGCT GTCCCATTTT CCCCAGCTGG CCGCATTCCT 

1D51 CTCTGCTTCT CAGCAGCTGT CCTACTCCCC AGGACGAGTT TTCACTAGAG 

55 1101 GGCCCACGAT GCCAGGATTC TGATTCATCT TCCTCCCAAG AAAAGCAAAG 

1151 CCAAATCAAG ACCACAGATA GGAACCTAAG CACAATGGGG TGCCTGCTTG 

1B01 GGCTGGGTCG AAGGCTCTGC TGACTGCTGT CCTTGTCCAT CACCCAATAC 

1E51 CACCCCAAAC ACAACTCAAC TTCCCACACC ACCATGTCTC TCACCACACC 
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WO 01/98454 



PCT/IB01/02050 



1301 TTCTGGGCCT CATTATCTCC CACAACTAGA 

1351 ATGTCCCTGG ACCTCCTGGT GTCTGCCTCT 

1M01 TCACAGTTGA GTGGGGGAAG AAACAGCCAG 

mSl GGGAGTTAGT ATAGGAATGT CCATCTCATA 

5 1501 CTGTGGCTGC AAATGTCTGA AGCCAGTTAG 

1551 CCTTGGACAT ACTTCTGCTA TTAACGCTAT 

IbOl TGGCTTTTTG TACCCACCGA GCCCCTGAGC 

lb51 AGAGCCTTGT AGAGAACTGC TCCTGTGAGG 

1701 TCACCACTCA GACTTCACCT ATTCAGCATT 

10 1751 TCCACCTCAT TAGGCCTTCT TCCTATCCCC 

1601 AAGCTTGTAT TGTCCTGGAA TCAGTGGCTT 

1651 TGCCAAAGCA AAAAGACAGA GGCTTTTTTT 

nOl CTGTCAGGAG ACAGAGGCTT TTTTGAATTC 

nSl AACCTTAAGA CGCCAGATCC CTGAGAGTCT 

15 B001 TCAAATCATG GATTAGGAGT AAAGAAAGAG 

2051 CTGTAATCCC AGCACTTTGG GAGGCTGAGG 

2101 AGGA6TTTGA GACCAGCCTG GGTAATATGG 

2151 AAAATACAAA AATTAGCCAG GTATGGTGGT 

2201 ACTTGGAAGG CTGAGGCATA GGAGTTGCTT 

20 2251 GTAGTGAGCC AAGTTCGTGC CATCGGACTC 

2301 GACCCTGTCT CCAAAAACAA ACAAAAAAGG 

2351 CAGCTAACCT GAACAAGGGA ACTGGGACCG 

2M01 GCCTGGGGTT GACTGGGTTA GAGAAGAACC 

21*51 TGACACCTGG CCTGCCCTTT CTCAGCTGCC 

25 2501 GCCTCCCCTG CCCTCAGAAG GAAAGGAGAG 

2551 CATAGCACCT GGTCTCAAAA TCCTAAAAGC 

2b01 TGCTCCACAA GGTCCACTTT CCTGGGTCTT 

2b51 TGCCTCCTGC TGCTTCTGTA ACTGCAGACC 

2701 TCGGCTCAGC TGCTTCTCCA TTGGAATAAA 

30 2751 AAAA 



BLAST Results 



35 

No BLAST result 



Medline entries 

40 

No Nedline entry 



45 

Peptide information for frame 2 



ORF from 11 bp to flfl3 bpi peptide length: Bfll 
50 Category: putative protein 
Classification: no clue 
Prosite motifs: ASP_PROTEASE (173-lflH) 



55 1 HLFIFPLSLP WRPSCUKESC STGflRtfAGRS REDSVTPPPS SPIilPTPPAGA 

51 MSTKflEARRD EGEARTRGUE AflLRDRAHLS <2<2RRLK(3AT<3 FLHKDSADLL 

101 PLDSLKRLGT SKDLflPRSVI URRLVEGNPN ULflGEPPRMfl DLIHGUESRR 

151 KTSRTEIPAL LVNCKCflDfiL LRVAVDTGT<2 YNRISAGCLS RLGLEKRVLK 
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CCGCCATGCC TCACCAACCT 
CGGAGTCTGT GCACATCTGC 
AATTCAATAC AACAAAGAGC 
AGGCTGAGAG CTATTTTTTC 
TTTGATTACC CTGTGCAAAA 
AGGTATTTAT CCGTTTCCAC 
CTTGCGTGTG TGTGTGTGGA 
CAGACAGGAC AGTGAGGTTG 
CTTTCTGATT TCTAGAACTA 
ATCTCTGGCC TCTTGAGCTT 
TCTAACCCCC TGCCAGGCTT 
TTTTTTTTAA AGTTTGGGGT 
ACTGTGAAGA GAAGAACCCG 
TTCTGGCTGG TTTGAGTCTC 
GCAGGCGCAA TGGCTCATGC 
TGGGTGGATC ACTTGAGGTC 
CAAAACCCCA TCTCTACTAA 
GAACACCTGT AATCCCAGCT 
GAACCTGGGA GATGGGGGTT 
CAGCCTGGGT GAAGGAGTGA 
AGCAGAGAAA GACAGTGGTA 
TTGGGCTGAA ACA6TCTTGA 
GGGATGCAAG GAGCTGCCTG 
TCCCCTGCCC TTTCTCAGCT 
GGCTCACTTA TCACTTGTGC 
TTTCCTCGCC CTCACTGCCT 
GTGCTGTGCC TTTCCTTGTC 
CCAGGCCCAA TTGCAAGCCC 
CTCTTGTTTC TCTAAAAAAA 



WO 01/98454 PCT/IB01/02050 

201 ASAGDLAPGP PTl2VE(2LEL(2 LGflETVVCSA CVVDAESPEF CLGLfiTLLSL 
ESI KCCIDLEHGV LRLKAPFSEL PFLPLYflEPG (2 



BLASTP hits 

No BLASTP hits available 

10 Alert BLASTP hits for DKFZphamy2_13gM-. frame 2 

PIRrSSObMti hypothetical protein YER1M3W - yeast (Saccharomyces 
cerevisiae) -. N = li Score = -i P = 0-2b 

15 TREMBL : RNDDbd_l product: "DNA ( cy tosine-S- ) -methyltransf erase"i 
Rattus 

norvegicus mRNA for DNA (cytosine-S- ) -methyltransf erase-, partial 
cds« -i 

N = l-i Score = fill P = D.flT 



20 



25 



30 



45 



50 



>PIR:SS0bMb hypothetical protein YER1M3W - yeast (Saccharomyces 
cerevisiae) 

Length = ^^fl 

HSPs: 

Score = TO (13-5 bits)-. Expect = 3.De-01-, P = 2-be-Dl 
Identities = 26/112 (25X)-, Positives = i»fl/112 W/.) 



fluery: 1SS TEIPALLVNCKCflDflLLRVAVDTGTdYNRISAGCLSRLGLEKRVLKASAGD- 
— LAPGPP 211 

T++P L +N + + ++ VDTG a +S + GL + + K G+ 

+ G 

35 Sbjct: Ml 

TflVPHLYINIEINNYPVKAFVDTGAflTTIHSTRLAKKTGLSRfllDKRFIGEARGVGTGKI 2Sfl 

fluery: 212 XXXXXXXXXXXXXXXX- 
CSA<2VVDAESPEFCLGL<2TLLSLKCCIDLEHGVLRL 2b3 
40 CS V+D + + +GL L C+DL+ 

VLR+ 

Sbjct: 251 IGRIHflAflVKIETfiYIPCSFTVLDTDI- 
DVLIGLDMLKRHLACVDLKENVLRI 310 



Pedant information for DKFZphamy2_13gM-> frame 2 
Report for DKFZphamy2_13gM.2 



CLENGTH3 2fll 

IMbO 3133D. c i? 

CpIJ fl-75 

55 CBL0CKS1I PROOOMID 

[BLOCKS] BPDM21G 

EPR0SITE3 ASP_PROTEASE 

EK\il All_Alpha 
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PCT/IB01/02050 



EKbD 



SI6NAL_PEPTIDE 17 
L0U_C0I1PLEXITY 



Lit, V. 



10 



15 



20 



25 



SE<2 MLFIFPLSLPWRPSCUKESCSTGfiRflAGRSREDSVTPPPSSPblPTPPAGAriSTKflEARRD 



PRD ccccccccccccccceeeccccccccccccceeecccccccccccccccchhhhhhhhhh 

SE<3 EGEARTRG(3EAt2LRDRAHLSl2<2RRLK(MT<2FLHOSADLLPLDSLKRLGTSKDL<2PRSVI 

SEG 

PRD ccccccchhhhhhhhhhhhhhhhhhhhhhhhhhccccccccccccccccccccccchhhh 

SE<2 (2RRLVEGNPNli)L(3GEPPRri(3I>LIHG(2ESRRKTSRTEIPALLVNCKC<3D(2LLRVAVl>TGT(J 

SEG 

PRD hhhhhccccccccccccccccccccccccccccccccchhhhhhhchhhhhhhhhhccce 

SEfl YNRISAGCLSRLGLEKRVLKASAGDLAPGPPT<2VEflLEL<2LGflETVVCSAl2VVDAESPEF 

SEG xxxxxxxxxxxxxxxx 

PRD eeecccchhhhhhhhhhhhhhhccccccccccchhhhhhhhccceeeeccceeecccccc 

SEfl CLGL<3TLLSLKCCIDLEHGVLRLKAPFSELPFLPLY<2EPG<2 

SEG 

PRD cccchhhhhhhhhhcchhhhhhhcccccccccccccccccc 



xxxxxxxxxxxx 



Prosite for DKFZphamy2_13gn . S 



PSDDim 



i?3->ias 



ASP_PROTEASE 



PD0C0012A 



(No Pfaro data available for DKFZphamy2_13gn .2) 



WO 01/98454 
»KFZphamyS_mbS 



PCT/IB01/02050 



5 group: intracellular transport and trafficing 

DKFZphamyE_mb5 encodes a novel 771 amino acid protein which 
shows blX identity to the human TYL protein and Mfl* identity to 
the human Tic protein- 

10 

Both proteins show similarity to Sec? of Saccharomyces 
cerevisiaei which takes function in vesicular traficking. The new 
protein shows also significant similarity to human ARN03-I which 
is involved in the control of Golgi structure and function- 
15 DKFZphamyS_mb5 is predominantly expressed in the ens and germ 
cells. 

The new protein can find application in diagnosis/therapy of 
diseases related to vesicular traficking e-g- in synapses of the 
20 central nervous system and in studying expression profiles- 



25 



similarity to TYL protein (Homo sapiens) 
Sequenced by ENBL 

Locus: /map="M l *5. 7 cR from top of Chr5 linkage group" 

30 Insert length: HSSfl bp 

Poly A stretch at pos- "4511n polyadenylation signal at pos- MMflT 



1 CTCGCTCAGC 

35 SI GGGCGCGGGC 

101 CGGCCTCCGA 

151 CAGCGGTCTA 

201 CTGCCATGGA 

551 GCCACCCGTG 

40 301 TGGGATGGCC 

351 AGCGAAGGGG 

M01 GTGGCCTTCC 

M51 CCTGGGGCCA 

501 GGAGGGCTGG 

45 551 CCAGATGCTG 

bOl TGTGCGGGAT 

b51 TGCTGCGGGG 

701 CTCACGGATG 

751 CCTCATCCAG 

50 flOl GCATTGGGGA 

flSl GGGGAGCTGG 

=101 GAATGTCCTG 

151 AAGATGGCCC 

1001 ACGGACAAGT 

55 1051 GTCAGACTCA 

1101 GTGCAGACCC 

1151 CGGCTGGCAC 

1S01 GGCCCGGCAG 



CTCTCCACAT CGCGGCTCCG 
AGCTCCGACC GGCGGCGGCG 
TGGCCCCGCC GTGAGAGGCC 
GAGGAGTCCC AGGAGCAGCC 
GGAGGACAAG CTCTTATCTG 
ACCCCGGTCC AGA6CCTGAA 
AGTGAGGGCC TGAACAGCAG 
CACCCCAGCG GACACTGAGG 
ATGGCCTCAG CCTTGGCCTC 
GACTTGAACA TTCTGGAAGA 
CGTGCTGGCA GAGGGGGACA 
AGGACCCTCA GCTGGGGTTG 
GGCTTCAGCG CCACGTTTGA 
CACCCAGTAC AGCAGCCTCG 
AGAGCGACAG CTGCGTCAGC 
CAGCGGGCCC GTGACAGCCC 
CATGGCGTTT GAGGGGGACA 
GCAGCCCCCT GCGGCGCTCC 
AGCCGCCTGT CTCTCATGGC 
TCAGGGCCCA GGGGGGGt^TG 
TGCTGAACTC AGCCAGTGAC 
GACTCTGAGC TCAGCAGCTC 
TCTGGCCAAC GGGTGCCAGG 
GCCGTCTCTA CCACCTCGAG 
CTGGGCAAGA ACAACGAGTT 

-96- 



GCACCTGAAG GGACGCGGGC 
GGGCGGGACA GGCAGCCCGG 
GGACCCGCGG CGGGGACCAG 
AGGACAGGCG GAAGCAGTGG 
CAGTGCCTGA GGAAGGCGAT 
GAGGAGCCAG GGGTCCGGAA 
CCTCTGCAGC CCAGGGCACG 
AACCCACGAA GGACCCAGAT 
TCTCTCACCA ATG6CCTAGC 
TTCAGCGGAG TCCAGGCCCT 
ATGCTTCCAG GAGCCTCTAC 
GATGGTCCCG GGGAGCCAGA 
GAAGATTCTG GAGTCAGAGC 
ACTCCCTAGA CGGGCTGAGC 
TTCGAGGCCC CCCTCACACC 
TGAGCCAGGG GCTGGGTTGG 
TGGGGGCAGC TGGTGGTGAT 
ATCTCCAGCA GCCGCTCTGA 
CATGCCCAAT GGATTCCATG 
AGGATGATGA TGAGGAGGAC 
CCCAGCCTGA AGGATGGCCT 
GGAGGGGTTG GAGCCTGGTA 
GGGTCAGTGA AGCTGCTCAT 
GGCTTCCAGC GCTGTGATGT 
TAGCAGGCTG GTGGCCGGGG 



WO 01/98454 



PCT/IB01/02050 



1251 AGTACCTCAG TTTCTTCGAC 
1301 AGAACATTCT TGAAGGCCTT 
1351 GCGGGTCCTC ACACACTTCT 
1HQ1 ACAGCACTTC GGAAGATGGG 
5 1451 CTCAACACGG ACCTGCACGG 
1501 GCAATTCATT GCCAACTTGG 
1551 AAGACCTGCT GAAGACCCTT 
IbOl TGGGCCATTG ATGAGGATGA 
lfc.51 TGACAAGTTC GGGACAGGCA 

10 1701 GCAACCCCTT CCTGGATGTC 
1751 CACGGCGTCC TGACCCGGAA 
IflOl GCCCCGTGGG AGGCGTGGCT 
1A51 CCATCCTGTA CCTGCAGAAG 
nDl GAGGGTGACC TGAAGAACGC 

15 nSl GGCCTCTGAC TACAGCAAGA 
2D01 ACTG6AGGGT ATTCCTCTTC 
2051 TGGATCCTCA GGATCAACCT 
2101 CCCAGCCGCT GTCAGCTCCA 
5151 CCTGCACCAC CCGCCTCTGC 

20 2201 AAGTTGAGGC AGCTGACTGC 
2251 CGAGAGGGGC ATCAAGTCCA 
2301 ACTATCTCAC CTTCGAGAAA 
2351 GCTATGAAAA TCAAAGTGGG 
2401 GCTGGCCACT CTGGAAGGGG 

25 2151 GCCCTGCCCT CAGCCAGGGC 
2S01 GCCACTGGGC CTGATACTTA 
2551 GCAGATGTCT CCAGTGGGGT 
2fc>01 TGGACCAAGC TCCAGTCAGT 
2b51 TGTGGGCCCA GGAGATGGAG 

30 2701 CCTTGGGCAT CTCCGGGCAT 
2751 TCCACCATGG AGCCTCATTT 
2S01 CCACCTCGCT GGAGAAGCTG 
2351 TTCTCATCAA GCTCCTCTCC 
2101 GACTCTAGGT CTCAGCTGGA 

35 2=151 AGTTGACCAG CAGCAGGTCT 
3001 GCAGCCTCCA GAACCATGCC 
3051 GGGACCCAGG CTTGTGCCCT 
3101 CATCCACTTC TTTTCATCCA 
3151 ACACAAATAT ACATCTATAA 

40 3201 GATGGTTTTG GAACTGGAAT 
3251 AGCCTATTTT GGAGCTTCCC 
3301 CTGGCATTCC TGACGCTCTA 
3351 CTTGCCCTGT TTTTCCCTCT 
3401 ACCGATGATA CTTTTGGAAA 

45 3451 TCCCATGGAT ATCCTGGGGT 
3501 TATTTGGGTC TTTATGTTGG 
3SS1 CCTCTTGGAC AGATCTACTG 
3b01 AATGGCTGCT TGTCAACAAG 
3t51 GAAGCCCTTA ATTCTTGGTT 

50 37D1 TTGTTTTTCT GTCGCTATTT 
3751 TAGCCAAGGC CACATCTGGG 
3AQ1 ACGGATTAGC TAGCACCTTT 
3651 GTGGGGGTGA TGGCACATTC 
3101 TTGTTTAAAT AGATTACTGT 

55 3151 ACACACTTAG TTAATAAAAT 
4001 TACAGGCCCA CTCACATTTG 
4051 TGTAGTGATT TAAATTCAGA 
4101 TTAGACCCAA ACCATCTGGC 



TTCTCGGGCT TGACTCTGGA CGGAGCACTC 
CCCGCTGATG GGGGAGACAC AAGAGCGTGA 
CCCGCCGGTA CTGCCAGTGC AACCCTGATG 
ATCCACACGC TCACCTGTGC CCTGATGCTG 
CCACAACATT GGCAAAAAGA TGTCCTGTCA 
ACCAGCTGAA TGATGGCCAA GACTTTGCCA 
TACAACTCCA TCAAGAATGA AAAGCTGGAA 
GCTGAGGAAA TCCCTGTCTG AGCTGGTGGA 
CGAAGAAGGT GACGCGAATC CTGGATGGTG 
CCACAGGCGC TCAGTGCCAC CACCTACAAG 
GACTCACGCT GACATGGATG GCAAGAGGAC 
GGAAGAAATT CTACGCAGTG CTCAAAGGGA 
GATGAGTACA GGCCT6ACAA AGCTCTATCG 
CATTCGCGTG CATCACGCTC TGGCCACCAG 
AGTCCAACGT GCTGAAGCTT AAGACAGCCG 
CAGGCACCGA GCAAGGAAGA AATGCTGTCC 
GGTGGCAGCC ATCTTCTCTG CCCCGGCCTT 
TGAAGAAGTT CTGTCGGCCC CTGCTGCCCT 
CAGGAGGAGC AACTGCGGTC TCATGAGAAT 
G6AGCTGGCC GAACACAGGT GTCACCCAGT 
AGGAGGCCGA GGAGTACCGG TTGAAGGAGC 
AGCCGTTATG AGACCTATAT CCACCTCCTG 
CTCAGATGAT CTGGAGCGGA TTGAGGCCCG 
ATGACCCTTC TCTCCGGAAG ACACATTCAA 
CATGTGACTG GCAGCAAAAC CACAAAGGAT 
GCTGACATGG ATTTGCAGAC CCCAGGGTGG 
CAGTGAGCAC AATTCCAGCC AGGGGCCACT 
TGATGGGCAG CTAGAGGGGT GCAGAAAGCC 
ATGCCGTTTG TGGC6TTGAT CTCCTTGCGT 
CAGACCCTCT CCCTGGCCCT TGTTTTCCTC 
TGTAGGCCAG TTGTGTGCAT GCTCTAGACA 
GAAGGGCTGT TGTCTTCCCA GGTCTTTCTC 
TCATCTTTTT TGTGTGTGAG GGCAGGTCTT 
ACCCCACCCT TTCTCCTCCT CCTTCCTCTG 
GCCGACCACC AGCACCATCC TCTCCTCCCA 
CAGGTCTCCT GCCTCACATC ACAATAATCT 
TTCAGTGTAA AGCTGACTCC ATCACATGTG 
TTGAGATCAC ACTGCCTCCT TTTTATACAG 
GAATAATATA TACATAAGGA ACCCCTGAAA 
CAGTTAGAGG ATGAAATCAG ATAAAGGAAA 
CTGTTAGGAA GGATGGCTGC ACCTGGCCCC 
GGAGGGAAGG GGGAGGCAGT GCTGGCCTCC 
TCCAGCTGAC CTGTGACTTA TACTGCTCTT 
AAATAGAGCG TGTATGCACC GCCCCGTTTG 
GTGAGTCGGA TGGGACCACG GCCCTGTTTA 
TGCT6CCA6G TCTCTGAGCT CCAGAGGT6G 
CTATAGGAAT AAAAGACACT CTGTCTCGCA 
CCCAAAGATG CTTGTCGGAG GACGGTTATG 
GTGGGAAAAG GTGGAATGAC AAGTTATTGA 
CTTTCATTTG TCTAGTGAAT CAGAAAGGCT 
AAGAGTGGAG AAATTTGCCA CTTGACGATC 
AAGCCCTGCA TTTCTCCAAC TGACAAGTGG 
AGTGTGGCTA TGAAGAGCGA ATCCTCTCTA 
AGTTTGGCCA GGAATTTGGC GTCAGTGGTA 
AAGCCAGGCT TGCAACTAAG TATCTAACTT 
AGGCAAGGGG CTATTGAGTA TGTGGAGAGA 
TTATTTAAGT TGGATCAGCT GAAGTGTGTT 
CCCTTCGTTT TGCTCAGAGG AAGTAAATGT 
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1151 tcacttaaat gaaattgaaa acgccatgtg gcaccacaaa 

4201 gtactttccc catgctgcct caaaagttct gtgagtttcg 

4251 cccacccttc acttcccgag ggcgggtgag tggagagcag 

4301 tctggcagct gtggacagat gtgcttcctg agcatgggtt 

5 4351 tcagtaaaaa aatgtttagt tcacttcctt aattgtataa 

4401 gtaaattata tacatgtact actgtactaa aatattatgt 

4451 acatacacaa aaatagaaat ttaaaaaaga tgagatgaaa 

4501 gtcaaagttc caaaaaaaaa aaaaaaaa 

10 

BLAST Results 



No BLAST result 

15 

Medline entries 



20 =>aoab4a2: 

Perletti Li Talarico Di Trecca Dt Ronchetti Fracchiolla NSi 
Ilaiolo ATi Neri A-i Identification of a novel genet PSDi adjacent 
to 

NFKB2/lyt-10-» which contains Sec? and pleckstrin-hom'ology 
25 domains* 

Genomics 4b : £51-251 (1117) 



30 

Peptide information for frame 5 



ORF from 20b bp to 251B bpi peptide length: 771 
35 Category: similarity to known protein 

Classification: Cell signaling/communication 

1 MEEDKLLSAV PEEGDATRDP GPEPEEEPGV RNGI1ASEGLN 

SI RGTPADTEEP TKDPDVAFHG LSLGLSLTNG LALGPDLNIL 

40 101 AGVLAEGDNA SRSLYPDAED PULGLDGPGE PDVRDGFSAT 

151 RGTflYSSLDS LDGLSLTDES DSCVSFEAPL TPLIflflRARD 

201 GDHAFEGDflG AAGGDGELGS PLRRSISSSR SENVLSRLSL 

251 GPfiGPGGDED DDEEDTDKLL NSASDPSLKD GLSDSDSELS 

301 DPLANGCdGV SEAAHRLARR LYHLEGFCJRC DVARC3LGKNN 

45 351 LSFFDFSGLT LDGALRTFLK AFPLMGETflE RERVLTHFSR 

401 TSEDGIHTLT CALHLLNTDL HGHNIGKKI1S C(2<3FIANLD<2 

451 LLKTLYNSIK NEKLEUAIDE DELRKSLSEL VDDKFGTGTK 

501 PFLDVPflALS ATTYKHGVLT RKTHADMDGK RTPRGRRGUK 

551 LYLflKDEYRP DKALSEGDLK NAIRVHHALA TRASDYSKKS 

50 fc.01 RVFLFflAPSK EEMLSUILRI NLVAAIFSAP AFPAAVSSPIK 

b51 TTRLCl2EE<2L RSHENKLRflL TAELAEHRCH PVERGIKSKE 

701 LTFEKSRYET YIHLLAMKIK VGSDDLERIE ARLATLEGDD 

751 ALSflGHVTGS KTTKDATGPD T 

55 

BLASTP hits 



AGAGCTCTCT 
GGGTCAGTGT 
AGCCAGGAGC 
GTGCCTCCCA 
TTATTTATTT 
ACATTATAAA 
ATAAATCTAA 



SSLCSPGHER 
EDSAESRPUR 
FEKILESELL 
SPEPGAGLGI 
HAMPNGFHED 
SSEGLEPGSA 
EFSRLVAGEY 
RYCflCNPDDS 
LNDG(2DFAKD 
KVTRILDGGN 
KFYAVLKGTI 
NVLKLKTADU 
KFCRPLLPSC 
AEEYRLKEHY 
PSLRKTHSSP 
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WO 01/98454 PCT/IB01/02050 
No BLASTP hits available 

Alert BLASTP hits for DKFZphamy2_mb5-. frame 2 

5 PIR:G012Q5 TYL protein - human-, N = 2-. Score = m21, P = fl-be-150 

TREI1BL:ABQ231S1_1 gene: "KIAADT42"i product: "KIAACHM2 protein"; 
Homo 

sapiens mRNA for KIAADTMS protein-i partial cds.-i N = It Score = 
10 1251-1 P 

= 2.3e-127 

TREMBL:Ub3127_l gene: "TIC"; product: "Tic"; Human SEC7 homolog 
Tic 

15 (TIC) mRNAi complete cds.-i N = 1-. Score = 1050-. P = M.be-lOb 



>PIR:G01205 TYL protein - human 
Length = b15 

20 

HSPs: 

Score = 1M21 (213-2 bits)-. Expect = fl.be-150-. Sum P(2) = fl.be- 
150 

25 Identities = 2flD/M52 (bl*)-, Positives = 33b/M52 i?**'/.) 
(Juery: 301 

DPLANGCQGVSEAAHRLARRLYHLEGFflRCDVARlJLGKNNEFSRLVAGEYLSFFDFSGLT 3b0 
D L+NG + EAA RLA+RLY L+GF++ DVAR LGKNN+FS+LVAGEYL 

30 FF F+G+T 

Sbjct: Ibb 

DTLSNGfiKADLEAAflRLAKRLYRLDGFRKADVARHLGKNNDFSKLVAGEYLKFFVFTGflT 225 
fluery: 3bl 

35 LDGALRTFLKAFPLHGETfiERERVLTHFSRRYCflCNPDDSTSEDGIHTLTCALMLLNTDL i*50 

LB ALR FLK LUGETfiERERVL HFS+RY (2CNP+ +SEDG 
HTLTCALMLLNTDL 
Sbjct: 22b 

LDflALRVFLKELALMGETflERERVLAHFSflRYFfiCNPEALSSEDGAHTLTCALMLLNTDL 2B5 

40 

fluery: M21 

HGHNIGKKI1SC(Jl3FIANLI>flLND6(2DFAKDLLKTLYNSIKNEKLEIiIAIDEDELRKSLSEL ISO 

HGHNIGK+M+C FI NL+ LNDG DF ++LLK 
LY+SIKNEKL+WAIDE+ELR+SLSEL 
45 Sbjct: 2flb 

HGHNIGKRPITCGDFIGNLEGLNDGGDFPRELLKALYSSIKNEKLflWAIDEEELRRSLSEL 3^S 

fluery: Hfll VDDKFGTGTKKVTRIL 

DGGNPFLDVPflALSATTYKHGVLTRKTHADMDGKRTPRGR 53b 
50 T> K + RI G +PFLD+ A YKHG L R< HAD D 

++TPRG+ 

Sbjct: 31b ADPN 

PKVIKRISGGSGSGSSPFLDLTPEPGAAVYKHGALVRKVHADPDCRKTPRGK MD1 

55 fluery: 537 

RGUKKFYAVLKGTILYLflKDEYRPDKALSEGDLKNAIRVHHALATRASDYSKKSNVLKLK 5^b 

RGUK F+ +LKG ILYLflK+EY+P KALSE +LKNAI 
+HHALATRASDYSK+ +V L+ 
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Sbjct: MOB 

RGUKSFHGILKGillLYLflKEEYKPGKALSETELKNAISIHHALATRASDYSKRPHVFYLR Mbl 
fluery: ST? 

TADURVFLFiaAPSKEEnLSIJILRINLXXXXXXXXXXXXXXXSnKKFCRPLLPSCTTRLCfl bSb 
TADURVFLFflAPS E+M SUI RIN+ S KKF 

RPLLPS TRL a 
Sbjct: m^S 

TADURVFLF(2APSLE<3Ni3SUITRINVVAAriFSAPPFPAAVSS(3KKFSRPLLPSAATRLS(3 £21 



fluery: bS? 

EEi3LRSHENKLR(3LTAELAEHRCHPVERGIKSK:EAEEYRLKEHYLTFEKSRYETYIHLLA ?lb 
EEfl+R+HE KL+ + +EL EHR + + + KEAEE R KE YL FEKSRY 

TY LL 
15 Sbjct: SEE 

EEtfVRTHEAKLKAHASELREHRAAflLGKKGRGKEAEEflRdKEAYLEFEKSRYSTYAALLR Sfll 

(Juery: 717 MKIKVGSDDLERIEARLATLEGDDPSLRKTHSSPAL 7SE 
+K+K GS++L+ +EA LA + L +HSSP+L 

20 Sbjct: SflE VKLKAGSEELDAVEAALAC3AGSTEDGLPPSHSSPSL bl7 

Score = b3 (T-S bits)-. Expect = fl-be-lSD-i Sum P(E) = fi.be-15D 
Identities = IT/bM (ST*)-, Positives = E3/bM <35Z) 

25 CJuery: 13E DVRDGFSATFEKILESELLRGTflYXXXXXXXXXXXXXXXXX- 
CVSFEAPLTPLI(J<2RARD ITD 

J> D FS FE ILES +GT Y +FE P P 

+ 

Sbjct: Ifl 

30 DGPDSFSCVFEAILESHRAKGTSYTSLASLEALASPGPTflSPFFTFELPPflPPAPRPDPP 77 

fluery: 111 SPEP 1TM 
+P P 

Sbjct: 7fl APAP fil 

35 

Pedant information for DKFZphamyE_mb5-. frame E 
40 Report for DKFZphamyE_mb5.E 

ELENGTHJ 771 

EflliO fiMbbQ. SS 

45 EpIJ S.DM 

EHOMOLJ PIR:GQ1ED5 TYL protein - human le-lSfl 

EFUNCAT3 30. OT organization of intracellular transport vesicles 

ES. cerevisiae-. YDR170cl Se-EE 

EFUNCATJ 3D.0S organization of golgi ES- cerevisiae n YDR17DcJ 
50 Se-ES 

EFUNCATJ 3D. 03 organization of cytoplasm ES- cerevisiae-. 

YDR170c3 Se-EE 

EFUNCATl Dfl-D? vesicular transport (golgi network-, etc.) ES. 
cerevisiae-i YDR17Dc]l Se-EE 
55 EFUNCATJ unclassified proteins ES. cerevisiae-i YPRDIScl 

Me-o«4 

EBLOCKSJ BLD1E77B 
EBL0CKS3 BPQE373F 



-100- 



WO 01/98454 PCT/BB01/02050 

IBLOCKSJ PROObSSC 

EBL0CKS3 PRDlDflflF 

EBLOCKSID PRDQSETB 

[BLOCKS! BPOSbMbD 

5 CBL0CKS3 PR0Q3^1A 

[BLOCKS! DI1013Smi 

[BLOCKS]) PFOlBbTB 

[BLOCKS! PFOlBb^A 

[SCOP! dlbtn 5. Ml. LI. 2 beta-spectrin [mouse (Hus 

10 musculus) brain le-3T 

[PIRKIil! transmembrane protein le-ED 

[SUPFAriJ Caenorhabditis elegans K0bH7.4 protein 7e-24 

[SUPFAfl! pleckstrin repeat homology 7e-24 

[PFAM! PH (pleckstrin homology) domain 

15 [Kill! Irregular 

[Kid! 3T> 

[Kid! LOU COHPLEXITY lfl.45 V. 



20 SEfl meedkllsavpeegdatrdpgpepeeepgvrngmaseglnsslcspgherrgtpadteep 

SEG xxxxxxxxxx 

lbtn- 



25 SEfl TKDPDVAFHGLSLGLSLTNGLALGPDLNILEDSAESRPliJRAGVLAEGDNASRSLYPDAED 

SEG xxxxxxxxxxxxxxx 

Ibtn- 



30 SEtJ PflLGLDGPGEPDVRDGFSATFEKILESELLRGTcJYSSLDSLDGLSLTDESDSCVSFEAPL 

SEG xxxxxxxxxxxxxxxxx 

Ibtn- 



35 SE<2 TPLIflflRARDSPEPGAGLGIGDIIAFEGDnGAAGGBGELGSPLRRSISSSRSENVLSRLSL 

SEG xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx 

Ibtn- 

40 SE(3 MAMPNGFHEDGPfJGPGGDEDDDEEDTDKLLNSASDPSLKDGLSDSDSELSSSEGLEPGSA 

SEG xxxxxxxxxxxxxxxxxxx xxxxxxxxxxxxxxxxxxxxxxxxxx. 

Ibtn- 



45 SE<3 DPLANGC<3GVSEAAHRLARRLYHLEGF(2RCDVAR(JLGKNNEFSRLVAGEYLSFFDFSGLT 

SEG 

lbtn- 



50 SE<3 LDGALRTFLKAFPLMGETflERERVLTHFSRRYCtJCNPDDSTSEDGIHTLTCALULLNTDL 

SEG 

Ibtn- 

55 SE(3 HGHNIGKKriSCflflFIANLDfiLNDGflDFAKDLLKTLYNSIKNEKLEIilAIDEDELR'kSLSEL 

SEG 

Ibtn- 
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SEC? VDMFGTGTKKVTRILDGGNPFLDVPfiALSATTYKHGVLTRKTHADMDGKRTPRGRRGWK 

SEG 

lbtn- EEEEEEEEEETTTEET — 

5 TTTCEE 

SE<2 KFYAVLKGTILYLtJKDEYRPDKALSEGDLKNAIRVHHALATRASDYSKKSNVLKLKTADb) 

SEG 

lbtn- EEEEEEETTEEEEECCHHHHHHCCBTTT- 
10 TCCEETTTTEEEETTTTTCTTTEEEEETTTT 

SEfl RVFLFfiAPSKEEMLSUILRINLVAAIFSAPAFPAAVSSnKKFCRPLLPSCTTRLCflEEflL 

SEG xxxxxxxxxxxxxxx 

Ibtn- 

15 CEEEEECCCHHHHHHHHHHHH 

SE<3 RSHENKLRtJLTAELAEHRCHPVERGIKSKEAEEYRLKEHYLTFEKSRYETYIHLLAMKIK 

SEG 

lbtn- 



20 



25 



SEfl VGSDDLERIEARLATLEGDDPSLRKTHSSPALStJGHVTGSKTTKDATGPDT 

SEG 

lbtn- 



(No Prosite data available for DKFZphamyB_l l *b5.2) 
30 Pfam for MFZphamyS_14b5. 5 



W1n_NAI1E PH (pleckstrin homology) domain 
35 Will 

*dvIREGli)!1yKlilgswrkstg nUqrRUFvLrndpnrLiYYkddk 

+ ++G + +++ + ++ U++ ++VL++ + L++ 

KD + 

fiuery 512 TTYKHGVLTRKTHADMDGKRTPRGRRGUKKFYAVLKG — 

40 TILYL<2KDE- SS7 



dekPr YMlIdld . cUrtlidVEidUmmdndHCFilUtrq . rtYYF 

+P+ ++++ + ++D ++ +++ +++T + 

45 R+++F 

fiuery 5Sfl -YRPDKALSEGDLKNAIRVHHALATRASDYSKK- 

SNVLKLKTADURVFLF fc,DS 

Hnn <2AeNeEEMmeldMsaIrRaIw* 
50 (2A+++EEM +U+ 1+ + + 

fiuery LDb (SAPSKEENLSIilILRINLVAA LSS 
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5 group: transcription factors 

DKFZphamyB_lMmlfci.pl encodes a novel 252 amino acid protein with 
similarity to the homeotic protein emx2 of marii mouse and zebra 
fish as well as to the gene "empty spiracles" of Drosophila 
10 melanogaster • 

Homoeobox genes are known to play important roles in 
developmental processes. In zebrafish emx2 mRNAs are found in the 
dorsal telencephalon -i parts of the diencephalon and the otocyst- 

15 The human homologue Emx2 appears to be already expressed in fl.S 
day embryos. It is also expressed in the presumptive cerebral 
cortex-i olfactory bulbs-i in some neuroectodermal areas in 
embryonic head including olfactory placodes in earlier stages and 
olfactory epithelia later in development. Mutants of the ». 

20 melanogaster gene "mempty spiracles" display spiracles devoid of 
filzkorper-i no antenna and an open head. 

The new protein can find application in modulating the expression 
of genes controlled by this transcription factor and modulation 
25 of neuronal development. 

strong similarity to homeotic protein emxE (Homo sapiens) 

perhaps differential splicing 

30 

Sequenced by ENBL 

Locus : /chromosome="lD" 

35 Insert length: 2mt, bp 

Poly A stretch at pos. 231fi n polyadenylation signal at pos. 2373 



1 GAAAAAAAAA GAAAAAAAAA GAAAAAAAAT TACCCCAATC CACGCCTGCA 

40 51 AATTCTTCTG GAAGGATTTT CCCCCCTCTC TTCAGGTTGG GCGCGTTTGG 

101 TGCAAGATTC TCGGGATCCT CGGCTTTGCC TCTCCCTCTC CCTCCCCCCT 

151 CCTTTCCTTT TTCCTTTCCT TTCCTTTCTT TCTTCCTTTC CTTCCCCCCA 

2D1 CCCCCACCCC CACCCCAAAC AAACGAGTCC CCAATTCTCG TCCGTCCTCG 

251 CCGCGGGCAG CGGGCGGCGG AGGCAGCGTG CGGCGGTCGC CAGGAGCTGG 

45 3D1 GAGCCCAGGG CGCCCGCTCC TCGGCGCAGC ATGTTCCAGC CGGCGCCCAA 

351 GCGCTGCTTC ACCATCGAGT CGCTGGTGGC CAAGGACAGT CCCCTGCCCG 

M01 CCTCGCGCTC CGAGGACCCC ATCCGTCCCG CGGCACTCAG CTACGCTAAC 

MSI TCCAGCCCCA TAAATCCGTT CCTCAACGGC TTCCACTCGG CCGCCGCCGC 

501 CGCCGCCGGT AGGGGCGTCT ACTCCAACCC GGACTTGGTG TTCGCCGAGG 

50 551 CGGTCTCGCA CCCGCCCAAC CCCGCCGTGC CAGTGCACCC GGTGCCGCCG 

bOl CCGCACGCCC TGGCCGCCCA CCCCCTACCC TCCTCGCACT CGCCACACCC 

b51 CCTATTCGCC TCGCAGCAGC GGGATCCGTC CACCTTCTAC CCCTGGCTCA 

701 TCCACCGCTA CCGATATCTG GGTCATCGCT TCCAAGGGAA CGACACTAGC 

751 CCCGAGAGTT TCCTTTTGCA CAACGCGCTG GCCCGAAAGC CCAAGCGGAT 

55 flOl CCGAACCGCC TTCTCCCCGT CCCAGCTTCT AAGGCTGGAA CACGCCTTTG 

SSI AGAAGAATCA CTACGTGGTG GGCGCCGAAA GGAAGCAGCT GGCACACAGC 

101 CTCAGCCTCA CGGAAACTCA GGTAAAAGTA TGGTTTCAGA ACCGAAGAAC 

151 AAAGTTCAAA AGGCAGAAGC TGGAGGAAGA AGGCTCAGAT TCGCAACAAA 
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10 



15 



20 



25 



30 



1DD1 AGAAAAAAGG 
1QS1 GCGAGTCCGG 
1101 TAACCCCACA 
1151 AGAAGGAAAA 
1201 AGAAAGGGAG 
1551 GACAGCGAGG 
13D1 GATGGAGGCT 
1351 GTGAGAGACA 
moi GAGAAAGAGA 
1M51 GGGAGCTGTC 
1501 ATTCCGTTTA 
1551 CGACCCAACC 
lt.01 CCCCCCCATC 
lbSl TAGCCCCATC 
1701 TTGGGCGAAT 
1751 TTTTAAAGAC 
IflOl AATGTGCAGT 
1651 GTGCAAGCTA 
1=101 ACGGGGAGCA 
1=151 CAAAAAGAAT 
2001 TTTCTTGGAC 
2051 AAACTTCTCC 
2101 TGCAATGCCT 
2151 CTGTAGTTAG 
2201 TTTTCAAGCA 
2251 TGACAAATGC 
2301 GTTTTGTGTT 
2351 TATGTCAATG 
2M01 AAAAAAAAAA 



GACGCACCAT 
AGGAAATAGA 
GAAACGGACA 
AACCCTACAA 
AGGGAATCGG 
GCACAGGGTC 
CCTTCATCAA 
CAGAGAGAAG 
GAGAGAGAGA 
AATCAAACAC 
TCACAGTCCA 
AGGCACAGGA 
TTTAAAAATA 
CCACACCTGT 
GGTGTTTAAA 
AGGTTCTGTG 
CTGTAAACAC 
AATGAAGTAG 
GGACGACGGG 
CAGGACTTGT 
ATTCCCTTTC 
TTTCAGTGGT 
ATTCCAAACT 
CGGGGATGAT 
CTGAGTTTCT 
TTCCTTAATG 
AATTTGTTAG 
TCAATATTTT 
AAAAAA 



ATTAACCGGT 
CGTGACCTCA 
ACATGGAGCA 
AACAAAAACA 
AGGGAGCAGC 
CCAAACCGAG 
CAAGCGACCC 
GAGAAAGAGG 
GAGAGAAAGC 
CAAACCGGGG 
CTTAAAAAAT 
CTTTTTTGTT 
ATTAGTAATA 
TTCAAATCCT 
GACCGAAAAT 
TGCTTTTTAT 
TTTTTGATAC 
GCTCAGCGAT 
GGGGCTGGGG 
ACTGGGAAAA 
CTAACATCCT 
TGGAGAAATT 
TTAAATCTAT 
GTTAAGTGTG 
ATTCCAAGAT 
TCTTCTATAC 
AATTCTAACA 
GTCAATAAAG 



GGAGAATCGC 
GATGATTAAA 
AAAGAGACAG 
AACCGCATAC 
GGAATGCGGC 
GCCGCGCCAA 
TCGTCTAAAG 
GAGGGAGAGA 
TGAACGTGCA 
AGACAAGATG 
GATGATGATG 
TTTTGCACTT 
AAAAACAAAA 
TGAAATGCAT 
GAATTGTAAT 
TTTGATTTTT 
CTTCTGATGT 
AGTGGTCCTC 
GTGGCGGGGG 
AAACCCCTAA 
GAGGCTTAAA 
GGCCGAGTTC 
CTATTGCAAA 
GCCAAGCGCA 
CATAGACTTA 
CAGAATGTAA 
CACTATATAC 
ATTTATCAAT 



CACCAAGCAG 
AACATAAACC 
GGAGAGGTGG 
ACGTTCACCG 
GAAGACTCTG 
GATGGCAGAG 
AGGCAGCTGA 
GAGAAAGAGA 
CTCTGACAAG 
ATTGGCAGGT 
ATAAAAACCA 
CGCTGTGTTT 
ATTCCATATC 
GTAGCAGTTG 
TTTCTTTTCC 
TTTCCCAAGA 
CAAAGTGATT 
TTACAGAGAA 
AGGGTGCCCA 
ATTAATTATA 
ACCCTGATGC 
AACCATTCAC 
ACCTGAAGGA 
CGGCGGCAAG 
CTAAAGAGAG 
ATATTTTTGT 
TTCCAAGAAG 
ATGCCCTCAC 



BLAST alert EHBL/EMBLNEIi) 



35 EI1BLNEL):AL133353 Human DNA sequence *** SEQUENCING IN PROGRESS 
*** from 

clone RP11-MA3F1H N = 2-. Score = 310A-, P = 5.3e-13i4 

EHBI_:HSEI1X2 H . sapiens EPIX2 mRNAi N = 1-, Score = 2365-. P = 5-le- 
40 101 



Medline entries 



45 =12331b0b: 

Simeone A-. Gulisano 11 -i Acampora Di Stornaiuolo A-. Rambaldi Mi 
Boncinelli E-i 

Tuo vertebrate homeobox genes related to the Drosophila empty 
spiracles gene are expressed in the embryonic cerebral cortex. 
50 EMBO J 

11=12 JulilK?) :2Sm-50 



55 



Peptide information for frame 1 
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ORF from 331 bp to lOflb bpi peptide length: 252 

Category: questionable ORF 

Classification: unset 

Prosite motifs: HOI1E0B0X_l (1A7-210) 

5 

1 HFflPAPKRCF TIESLVAKDS PLPASRSEDP IRPAALSYAN SSPINPFLNG 
51 FHSAAAAAAG RGVYSNPDLV FAEAVSHPPN PAVPVHPVPP PHALAAHPLP 
1D1 SSHSPHPLFA SfiflRDPSTFY PULIHRYRYL GHRFlSGNDTS PESFLLHNAL 
10 151 ARKPKRIRTA FSPSlSLLRLE HAFEKNHYVV GAERKflLAHS LSLTETflVKV 
201 UFlJNRRTKFK RfiKLEEEGSD SfiiJKKKGTHH INRURIATKQ ASPEEIDVTS 
551 DD 



15 Alert BLASTP hits for DKFZphamy2_lHmlb-. frame 1 

PIR:IS1737 homeotic protein emx2 - zebra fishi N = 2i Score = 

753-. P = 

Ie-1D5 

20 

PIR:S22?22 homeotic protein emx2 - human (fragment) i N = 1-, Score 

• 7b3i P = L3e-7S 

25 TRENBI_:0LA132mD3_l gene: n emx2 n i product: "Emx2 protein"; 
Oryzias 

latipes mRNA for Emx2 proteinn partial; N = Ei Score = 513 n P = 
M-5e-72 

30 

>PIR:S22722 homeotic protein emx2 - human (fragment) 
Length = 15fl 

HSPs : 

35 

Score = 7b3 (114-S bits)-, Expect = 1.3e-75-. P = l-3e-7S 
Identities = lIN/mi (10DX) n Positives = mM/lMM (1D0*) 

fluery: 1D1 

40 FASflflRDPSTFYPWLIHRYRYLGHRFflGNDTSPESFLLHNALARKPKRIRTAFSPSflLLR Itfi 

FASfiflRDPSTFYPULIHRYRYLGHRFflGNDTSPESFLLHNALARKPKRIRTAFSPSflLLR 
Sbjct: 15 

FASflflRDPSTFYPIdLIHRYRYLGHRFtJGNDTSPESFLLHNALARKPKRIRTAFSPSflLLR 7^ 

45 

fluery: lbl 

LEHAFEKNHYVVGAERK(3LAHSLSLTET(JVKVUF(!NRRTKFKR(2KLEEEGSDS(3(3KKKGT 226 

LEHAFEKNHYVVGAERKi3LAHSLSLTET(3VKVUFt2NRRTKFKRflKLEEEGSDS<3(2KKKGT 
50 Sbjct: 75 

LEHAFEKNHYVVGAERK<JLAHSLSLTET(3VKVUF(2NRRTKFKR(2KLEEEGSDSl3(3KKKGT 13M 

fiuery: 22T HHINRURIATKflASPEEIDVTSDD 252 
HHINRURIATKflASPEEIDVTSDD 
55 Sbjct: 135 HHINRWRIATK<2ASPEEIDVTSDD 15fl 

Pedant information for DKFZphamy2_mmlb-i frame 1 
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Report for DKFZphamy2_14mlb • 1 



3b2 

4074^.26 
10.51 

PIR:I51?37 homeotic protein emx2 - zebra fish 



3D. ID nuclear organization 



[LENGTH! 

emio 
ipii 

EH0M0L! 
113 

[FUNCAT! 
YML027w! Se-DS 

[FUNCAT! D4 . other transcription activities 

cerevisiaei YI1L027w! Se-DS 

[FUNCAT! D3-D7 pheromone response! mating-type 

determination-i sex-specific proteins [S. 

cerevisiae-i YCRO^wJ 5e-04 

[FUNCAT! D4. 05. 01. 04 transcriptional control 

cerevisiae-i YDLlObc! 7e-D4 



IS- cerevisiae 



[S- 



IS. 



[FUNCAT! 



IS. cerevisiae-. YDLlObc! 7e-D4 
[FUNCAT! 
CS - cerevisiae-. 
[BLOCKS! 



01.04-04 regulation of phosphate utilization 



01.03.13 regulation of nucleotide metabolism 
YDLlObc! 7e-D4 



PR0D04 t lI> 

[BLOCKS! PROOTOIH 

CBL0CKS3 PR004B7F 

[BLOCKS! PR0071bC 

[BLOCKS! BLDD035C 

[BLOCKS! BLDDD27 'Homeobox' domain proteins 

[BLOCKS! PRDDD2bA 

[BLOCKS! BLDD032C 

[BLOCKS! BL00032B 'Homeobox' antennapedia-type protein 

[SCOP! dlau7bl LM.l.l.b Pit-1 P0U homeodomain Pit-1 

Pit-1 [Rat (Rattu Se-lb 

[SCOP! dlyrna_ 1-4.1.1-S mating type protein Al 

Homeodomain mat alpha Se-lS 

[SCOP! dlenh 1-4.1. 1-1 engrailed Homeodomain 

[(Drosophila melanogaster 2e-13 



[PIRKU1 
[PIRKbl! 
[PIRKU! 
[PIRKbl! 
[PIRKIO 
[PIRKU1 
[PIRKU! 
[PIRKLO 
[PIRKU! 
[PIRKLJ! 
[SUPFAfl! 
[SUPFAM! 
[SUPFAfl! 
[SUPFAI1! 
[SUPFAfl! 
[SUPFAfl! 
[SUPFAM! 
[SUPFAM! 
[PR0SITE3 
[PFAM! 
[Kid! 



nucleus le-b7 
heart 3e-lD 
DNA binding le-b? 
leukemia 3e-15 
alternative splicing le-10 
proto-oncogene 3e-lS 
transcription factor be-11 
embryo 1e-12 

transcription regulation le-b7 
homeobox le-fc>7 . 
homeobox homology le-b7 
homeotic protein Hox AS 7e-10 
homeotic protein Hox B3 3e-lD 
homeotic protein Hox B2 3e-ll 
homeotic protein Hox Bl 7e-ll 
unassigned homeobox proteins le-t7 
homeotic protein goosecoid 4e-10 
homeotic protein Hox D4 16-12 
H0ME0B0X_1 1 
Homeobox domain 
Irregular 
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EKbO L0U_C0I1PLEXITY 25. V. 



5 SEfl 

EKKRKKKKKNYPNPRLflILLEGFSPLSSGUARLV(2])SRI>PRLCLSLSLPPPFLFPFLSFL 
SEG 

• xxxxxxxx xxxxxxxxxxxxxxxxx 

If jlA 

10 

SEA 

SSFPSPHPHPHPK(2TSP<2FSSVLAAGSGRRRi2RAAVARSlilEPRAPAPRRSI1F<2PAPKRCF 
SEG 

15 xxxxxxxxxxxx xxxxxxxxxxxxxxxx 

If jlA 



SE<2 

20 TIESLVAKDSPLPASRSEDPIRPAALSYANSSPINPFLNGFHSAAAAAAGRGVYSNPDLV 
SEG 

xxxxxx 

If jlA 

25 "*"* 
SEfl 

FAEAVSHPPNPAVPVHPVPPPHALAAHPLPSSHSPHPLFASdflRDPSTFYPULIHRYRYL 
SEG 

• • • • xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx 

30 IfjlA 

SEU 

GHRF(2GNDTSPESFLLHNALARKPKRIRTAFSPS<2LLRLEHAFEKNHYVVGAERK<JLAHS 
35 SEG 

IfjlA 

CCCCCCCCCCHHHHHHHHHHHHHTTTTCHHHHHHHHHH 

40 SE<3 

LSLTET(3VKVUF(2NRRTKFKR(2KLEEEGS1>S(2(3KKKGTHHINRURIATIC(2ASPEEIDVTS 
SEG 

IfjlA 

45 HCCCHHHHHHHHHHHHHHHHHH 



SE<2 DD 

SEG 

IfjlA 

50 

Prosite for »KFZphamy2_mmlt. - 1 
PSDDOE? 2"i7->321 H0I1E0B0X_1 PD0CDDD27 

55 

Pfam for DKFZphamy2_mmlb • 1 
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HMi1_NAf1E Homeobox domain 

HUM 

5 *RRRpRTtFTre<2LdELEREFHf NrYPTRqRREELAflmLNLTERflVKIWF 

+R RT+F+ +13L++LE +F + N+Y+ ++R 

+LA++L+LTE+AVK+UF 
fluery 5tM 

PKRIRTAFSPSQLLRLEHAFEKNHYVVGAERKtJLAHSLSLTETflVKVlilF 31H 

10 

HMM flNRRflKldKRMH* 

flNRR+K KR+ 
Guery 313 flNRRTKFKRdK 3S3 
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5 group: amygdala derived 

DKFZphamy2__lbel , 4 . p3 encodes a novel 32fl amino acid protein-i 
similar to carbonic anhydrase-related proteins- 

10 A similar cDNA encoding a protein of the same length was 

identified in sheep. This protein shows a strong signal sequence! 
which indicates that it is a secreted protein- The new protein 
belongs to a protein family-i which was designated carbonic 
anhydrase-related protein XI (CA-RP XI) ■> encoded by CAll (human) 

15 and Carll (mouse-i rat)- Despite potentially inactivating changes 
in the active-site residues-! CA-RP XI is evolving very slowly in 
mammals-i a property indicative of an important function-! which 
has also been observed in the two other "acatalytic" CA isoforms-i 
CA-RP VIII and CA-RP X- 

20 No informative BLAST results; No predictive prosite-i pfam or SCOP 
motif e« 

The new protein can find application in studying the expression 
profile of amygdala-specific genes. 

25 

similarity to carbonic anhydrase-related protein (Homo sapiens) 
ESTs ending at appr- IflOD have polyA-signal 

30 

Sequenced by EMBL 

Locus: /map="17qSin S-13cR from GATA41CD5" 

35 Insert length: 22b? bp 

Poly A stretch at pos. 2252i polyadenylation signal at pos. 2231 



1 GGATGGAAAT AGTCTGGGAG GTGCTTTTTC TTCTTCAAGC CAATTTCATC 

40 51 GTCTGCATAT CAGCTCAACA GAATTCACCA AAAATCCATG AAGGCTGGTG 

101 GGCATACAAG GAGGTGGTCC AGGGAAGCTT TGTTCCAGTT CCTTCTTTCT 

151 GGGGATTGGT GAACTCAGCT TGGAATCTTT GCTCTGTGGG GAAACGGCAG 

201 TCGCCAGTCA ACATAGAGAC CAGTCACATG ATCTTCGACC CCTTTCTGAC 

2S1 ACCTCTTCGC ATCAACACGG GGGGCAGGAA GGTCAGTGGG ACCATGTACA 

45 301 ACACTGGAAG ACACGTATCC CTTCGCCTGG ACAAGGAGCA CTTGGTCAAC 

3S1 AT ATCTGGAG GGCCCATGAC ATACAGCCAC CGGCTGGAGG AGATCCGACT 

401 ACACTTTGGG AGTGAGGACA GCCAAGGGTC GGAGCACCTC CTCAATGGAC 

MSI AGGCCTTCTC TGGGGAGGTG CAGCTCATCC ACTATAACCA TGAGCTATAT 

501 ACGAATGTCA CAGAAGCTGC AAAGAGTCCA AATGGATTGG TGGTAGTTTC 

50 551 TATATTTATA AAAGTTTCTG ATTCATCAAA CCCATTTCTT AATCGAATGC 

bOl TCAACAGAGA TACTATCACA AGAATAACAT ATAAAAATGA TGCATATTTA 

b51 CTACAGGGGC TTAATATAGA GGAACTATAT CCAGAGACCT CTAGTTTCAT 

701 CACTTATGAT GGGTCGATGA CTATCCCACC CTGCTATGAG ACAGCAAGTT 

751 GGATCATAAT GAACAAACCT GTCTATATAA CCAGGATGCA GATGCATTCC 

55 flOl TTGCGCCTGC TCAGCCAGAA CCAGCCATCT CAGATCTTTC TGAGCATGAG 

551 TGACAACTTC AGGCCTGTCC AGCCACTCAA CAACCGCTGC ATCCGCACCA 

101 ATATCAACTT CAGTTTACAG GGGAAGGACT GTCCAAACAA CCGAGCCCAG 

=551 AAGCTTCAGT ATAGAGTAAA TGAATGGCTC CTCAAGTAGG GAACAAAGCC 
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1001 AAGAAGAATC CCACCTCAGT GAAATGCTAC AACTGTGAAT 

1051 TAGAATGTCC CCCTTCTTGC TTCTCTCTCC TTCTTTCCCC 

1101 TCATTCTTGG GATTGGCCCT TTCTTCATGA AAAGTGTCTG 

1151 GCAGAGGAAT ACATCTCTCA CACATACTCA CAAACACACA 

5 1201 TGCACATACA TACAAACACA TGCAAACATA CCTACACACA 

1251 TACAACCTCC ATCATGGGAA GTCAAGTTTC AGAAACAAAA 

1301 TAAGAGGTCT TAGAAGAAAA TAACCAGTTA ACCTGATTTC 

1351 CCGTTTTCCT GAACTAATAA ATCTACCCAA TGAGACTTTT 

mOl ACATACAAAA TTCTTCCAAA AGAGAGAGGA GAAAATACAG 

10 m51 ATCAAACGGA CTTTGCATCA AGTAATTTCA GATAGTGTCC 

1501 TGAGGGTGCT GGTAGCAGGT GAGCAGGACA AA6TTGACCA 

1551 TTTCTAGATT ATGATTCTTC TGTTTACTCA ACAATTTACA 

IbOl GGACAGACAT TGAAGAGCTA CACATTGTAT ATATATCACC 

lb51 AGGAAATGGA ATTATTTCCC TCTTTGTCAC ATATCTGTAG 

15 1701 CAAGATCAGA AATGATCCAT TTGCTGTTTC TTGTTTTCCA 

1751 ATTGTGTTTG GTTATTGTTA CCAGCTCAAT AAATGTGTTT 

1801 TTTCATTTTT CTGGCTTTGG TCTGTTCTCC TTCCTTACAG 

IflSl GGCTCCATGC AACTGCATTC TTTGATTTCA CTTGTTCCTT 

nOl TTTTGTTCAT TTGCAGCCAG TTTTTACTGA GTTTGTGGCA 

20 nSl CATTTGCTAA GCAAGTAT6A CTTTAATTCC ACTCCATGGC 

2001 ACATGAGGTG AGCTTCAGCC TGAGATAGCA GGCGACAGAC 

2051 TCAAAACTGC CATGCCCCCC TGTGATGCTC CCGTGAAGGA 

2101 CCTTGTAAGT TCCTGG6AAA GGGGTATGTT TTCTCTCCAG 

2151 ATCTCACAAA GTACAAAACG AATGCCTTTC TTTTCTTGTT 

25 2201 ACTCACTGTG TTTGGTTACT GTCAAGAAAT CAATAAATGT 

2251 TCAAAAAAAA AAAAAAA 



BLAST alert EMBL/ENBLNEU 
30 

EMBL : AFObHflSM Homo sapiens map 17q2Hi 5-13cR from GATAmC05 
repeat 

regioni complete sequence- i N = 2i Score = A7fi4i P = 0 

35 

EMBLNEU : AC0D5&A3 Homo sapiens chromosome 17 clone RPU-ISBEll map 
17-. 

UORKING DRAFT SEQUENCE-. 2 ordered pieces- i N = 3-. Score = b2t0-> P 
= 0 

40 

Medline entries 



=10^73141: 

45 Lovejoy DA-i Hewett-Emmett Di Porter CAi Cepoi Di Sheffield A-. 
Vale Ubli 

Tashian RE-i Evolutionarily conserv/edi "acatalytic" carbonic 
anhydrase-related protein XI 

contains a sequence motif present in the neuropeptide sauvagine: 
50 the 
human 

CA-RP XI gene (CA11) is embedded between the secretor gene 

cluster and 

the 

55 DBP gene at lTql3-3. Genomics mfl Dec 15i54(3) :4fl4-T 



TGACGTAACC 
CAAGCCTCAT 
CAAAACCATG 
CACAAGCACT 
CACACACTCT 
GTCTCATTCA 
AATTTTGATA 
CAGCCTTTGT 
CTCTGATGGC 
TAGGATCCTT 
AGGACACTTA 
AAGAAAAAAA 
ACAGACTATA 
TAGGATTTGC 
AAGGTCATAC 
AACGAGTTAA 
GCTAAGCCCT 
CATCTACATG 
ATCAGGAATG 
TCAATCATTC 
TTCTTGCGTT 
ATGCACTTTG 
GTGCAGCCAG 
TATAATGGTC 
GTTTAACAAG 
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Peptide information for frame 3 



5 ORF from □ bp to Iflb bpi peptide length: 321 
Category: similarity to known protein 
Classification: unclassified 

1 MEIVUEVLFL LQANFIVCIS AflflNSPKIHE GUWAYKEVVfl GSFVPVPSFU 

10 SI GLVNSAUNLC SVGKRflSPVN IETSHMIFDP FLTPLRINTG GRKVSGTflYN 

101 TGRHVSLRLD KEHLVNISGG PMTYSHRLEE IRLHFGSEDS (2GSEHLLNG13 

151 AFSGEVfJLIH YNHELYTNVT EAAKSPNGLV VVSIFIKVSD SSNPFLNRML 

201 NRDTITRITY KNDAYLLflGL NIEELYPETS SFITYDGSMT IPPCYETASU 

SSI IIHNKPVYIT RflfiMHSLRLL S<2N(3PS<3IFL SflSDNFRPVfl PLNNRCIRTN 

15 3Q1 INFSLflGKDC PNNRA(2KL<2Y RVNEULLK 



Alert BLASTP hits for DKFZphamyE_lbem- > frame 3 

20 PIR:JE03?5 carbonic anhydrase-related protein - human; N = li 
Score = 

137-, P = M-be-m 

SUISSNEb):CAHB_SHEEP CARBONIC ANHYDRASE-RELATED PROTEIN 2 
25 PRECURSOR 

(CARP 2) (CA-RP II) (CA-XI).i N = 1, Score = 135-. P = 7-5e-m 



>PIR:JE0375 carbonic anhydrase-related protein - human 
30 Length = 32fl 

HSPs: 

Score = 137 (mO-b bits)-, Expect = M-be-m-, P = M-be-IM 
35 Identities = 11.1/267 (SflJJ)-. Positives = 223/2A7 (77*) 

Query: 3Q 

EGUUAYKEVVflGSFVPVPSFWGLVNSAUNLCSVGKRflSPVNIETSHIIIFDPFLTPLRINT SI 
E WU+YK+ +C2G+FVP P FUGLVN+AU+LC+VGKR<2SPV++E 
40 +++DPFL PLR++T 
Sbjct: 32 

EDWUSYKDNLflGNFVPGPPFWGLVNAAWSLCAVGKRfiSPVDVEVKRVLYDPFLPPLRLST 11 
(Juery: 10 

45 GGRKVSGTPIYNTGRHVSLRLDKEHLVNISGGPMTYSHRLEEIRLHFGSEDSflGSEHLLNG mi 

GG K+ GT+YNTGRHVS +VN+SGGP+ YSHRL E+RL FG+ D 

GSEH +N 
Sbjct: 12 

GGEKLRGTLYNTGRHVSFLPAPRPVVNVSGGPLLYSHRLSELRLLFGARDGAGSEHfllNH 1S1 

50 

fluery: ISO 

flAFSGEVflLIHYNHELYTNVTEAAKSPNGLVVVSIFIKVSDSSNPFLNRtlLNRDTITRIT 201 

a FS EVflLIH+N ELY N + A++ PNGL ++S+F+ V+ 
+SNPFL+R+LNRDTITRI+ 
55 Sbjct: 152 

f2GFSAEV(2LIHFN<2ELYGNFSAASRGPNGLAILSLFVNVASTSNPFLSRLLNRDTITRIS 211 
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Query: BIO 

YKNDAYLLtJGLNIEELYPETSSFITYDGSriTIPPCYETASUIiriNKPVYITRMtSnHSLRL Sbl 
YKNDAY L<2 L++E L+PE+ FITY GS++ PPC ET +UI++++ + IT 

+<3MHSLRL 
5 Sbjct: 212 

YKNDAYFLflDLSLELLFPESFGFITYdGSLSTPPCSETVTWILIDRALNITSLflflHSLRL 271 

fiuery: 270 LS(2Nl2PS<2IFLSHSDNFRPVflPLNNRCIRTNINFSL<2GKDC — PNNR 31M 
LSflN PS<2IF S+S N RP+flPL +R +R N + + C PN R 

10 Sbjct: 272 LSflNPPSfllFflSLSGNSRPLflPLAHRALRGNRDPRHPERRCRGPNYR 31fl 



Pedant information for DKFZphamy2_lbemi frame 3 
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Report for DKFZphamy2_lbem -3 




ELENGTH3 


3Efl 




EMio 


37Sb3.n 


20 


EpI3 


a. 22 




EH0H0L3 


PIR5JED37S carbonic anhydrase-related protein 




human le-101 




EBL0CKS3 


DMOllOTB 




EBL0CKS3 


BL001h2F 


25 


EBL0CKS3 


BL001b2E 




EBL0CKS3 


BL001fci2D 




EBL0CKS3 


BLDOlbEC Eukarvotic- tvDe carbonic anhvdrases 




proteins 






IBL0CKS3 


BLODlbSA Eukaryot ic-type carbonic anhydrasGS 


30 


proteins 






ESC0P3 


dlznca_ 2-5b-l-l-3 Carbonic anhydrase Ehuman 




(Homo sapiens 


Ie-1D3 




ESC0P3 


d2cba 2-5b-l-l-2 Carbonic anhydrase inhuman 




(Homo sapiens 


Te-17 


35 


EEC3 


4-2-1-1 Carbonate dehydratase le-3b 




EEC3 


3-1-3-Mfl Protein-tyrosine-phosphatase He-20 




EPIRKU3 


blocked amino end fie-i?*! 




EPIRKliO 


carbon-oxygen lyase le-3b 




EPIRKU3 


zinc le-3b 


40 


EPIRKW3 


polymorphism 2e-20 




EPIRKU3 


hydro-lyase le-3b 




EPIRKU3 


transmembrane protein 3e-23 




EPIRKliO 


tyrosine-specif ic phosphatase 2e-BD 




EPIRKU3 


brain be-lb 


45 


EPIRKW3 


acetylated amino end le-3b 




EPIRKU3 


phosphatidylinositol linkage 2e-lT 




EPIRKW3 


receptor 2e-2Q 




EPIRKU3 


liver 36-2^ 




EPIRKtO 


phosphoprotein 2e-20 


50 


EPIRKliO 


saliva 2e-21 




EPIRKtO . 


glycoprotein 2e-22 




EPIRKIO 


mitochondrion le-32 




EPIRKIO 


monomer 3e-32 




EPIRKIO 


alternative splicing be-lb 


55 


EPIRKIO 


lipoprotein 2e-lT 




EPIRKIO 


pyroglutamic acid 2e-21 




EPIRKIO 


metalloprotein be-35 




EPIRKIO 


muscle Me-31 
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CPIRKU1 


membrane protein Ee-n 


EPIRKliD 


phosphoric monoester hydrolase Ee-EQ 


EPIRKbD 


homodimer 3e-E3 


CSUPFArO 


fibronectin type III repeat homology Se-ED 


ESUPFAnj 


carbonic anhydrase homology le-3b 


(LSUPFAMJ 


protein-tyrosine-phosphatasei receptor type zeta 


be-lb 




CSUPFAM3 


carbonate dehydratase le-3b 


CSUPFAH3 


protein-tyrosine-phosphatase i receptor type gamma 


Se-SO 




ESUPFAPD 


protein-tyrosine-phosphatase homology Ee-EO 


CSUPFAM3 


leukocyte common antigen cytosolic domain 


homology Ee-EO 




CPFAill 


Eukaryotic-type carbonic anhydrases 




All Beta 


EKkU 


3D 


CKU3 


SIGNAL PEPTIDE 22 


SEfl 





nEIVUEVLFLLt2ANFIVCISAt2(2NSPKIHEGliJb)AYKEVV(2GSFVPVPSFIJ6LVNSAIilNLC 
lugc- 



25 SEC 

- SVGKRflSPVNIETSHMIFDPFLTPLRINTGGRKVSGTHYNTGRHVSLRLDKEHLVNISGG 
luge- • • TTTTCCCEETTTTTEETTTTCEEEEETT- 
TTCEEEEEETTTTEEEEECTTTTTEEEEE 

30 SE(2 

PMTYSHRLEEIRLHFGSEDS<2GSEHLLNGl3AFSGEVflLIHYNHELYTNVTEAAKSPNGLV 
luge- TTCCCEEEEEEEEEETTTTTTCTTTEETTBCCCEEEEEEEEEGG- 
GTTHHHHHCTTTTEE 

35 SE(3 

VVSIFIKVSDSSNPFLNRMLNRDTITRITYKNDAYLL<2GLNIEELYPETSSFITYDGSI1T 
luge- EEEEEEEEC-CCCGGGHHHH— 
HHGGGCCTTTEEEETTTTCG6GGCCCCCCEEEEEECCC 

40 SE(2 

IPPCYETASUIII1NKPVYITRf1i2MHSLRLLS(aNi3PS(3IFLSI1SDNFRPV(2PLNNRCIRTN 
lugc- 

TTTTCCCEEEEEECCCEEECHHHHHHHHCCBCCTTTTCCCBTTTTCCCCCCTTTTCCEEC 

45 SE<J INFSLdGKDCPNNRAflKLflYRVNEULLK 
luge- 



(No Prosite data available for DKFZphamy2_lbem -3) 

50 

Pfam for DKFZphamy2_lbem-3 



HrH1_NAI1E Eukaryotic-type carbonic anhydrases 

55 

win 

*WCYgeHUGPEHH WHkhYPIAti) GDRi2SPINI(2lilkearYDPS 
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WYE + + + G RflSP+NI ++ 

+DP 

fluery 33 

IrfAYKEVVflGSFVPVPSFUGLVNSAWNLCSVGKRflSPVNIETSHniFDPF fll 

5 

win 

LKPUrv • SYYpaUCrEUeIlilNN6HSF(2VeFDDSf1DnSVLsGGPLPgHPYR 

L P+R+ ++ ++++ ++ N+G+ + +D +SGGP++ 

++R 

10 fluery 62 LTPLRINTGGRKVSG — THYNTGRHVSLRLDK- 

EHLVNISGGPHTY-SHR ■ 127 

Hnn 

LkflFHFHUGGASsNDUGSEHTVDGmkYPMELHLVHWNStKYnNYdEAfldq 
15 L + ++H G S++ +GSEH ++G +++ E+ L+H+N +Y N+ 

EA++ 

fiuery 12fi LEEIRLHFG— 

SEDS<2GSEHLLNG<2AFSGEVi3LIHYNHELYTNVTEAAKS 175 

20 Hnn 

PDGLAVIGVFMKVGNYqENPyLUKVv . - DALdnlKYKGKratllTNFDPsC 

P+GL V+ +F+KV NP L++ + D + I YK + 

+++++ 

fluery 17b PNGLVVVSIFIKVS- 

25 dssnpflnrnlnrdtitritykndayllflglniee 22^1 
hum 

LLPpPnCRDYUTYPGSLTTPPChECVTUIVCKEPIsISsEUIIUKFRsLLF 

L P+ + TY GS+T+PPC+E UI+ P+ I + UPI +R 

30 L 

fluery 225 LYPE— 

TSSFITYDGSi1TIPPCYETASIi)IIHNKPVYITRri(2riHSLRLLS(3 272 

Him NhEGEeeVpnVDNklRPPflPLKhRWRASF* 
35 N +n DN+RP (2PL++R +R + 

fluery 273 NflPSfllFLSnSDNFRPVflPLNNRCIRTNI 301 
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5 group: nucleic acid management 

DKFZphamy2_lcl2 encodes a novel H22 amino acid protein with 
partial identity to I-kappa-B-related protein and to BRCA1 . 

10 I-kappa-B-related protein interacts with transcription factors 
and BRCA1 has a function in DNA damage response- I-kappa-B-alpha 
mutations contribute to constitutive NF-kappaB activity in 
cultured and primary HRS (Hodgkin/Reed-Sternberg) cells and are 
therefore involved in the pathogenesis of Hodgkin's disease (HD) 

15 patients. 

The new protein can find application in modulating DNA repair and 
mutagenesis and also in expression profiling in HD related 
syndroms. 

20 

similarity to I-kappa-B-related protein 
Sequenced by NediGenomix 

25 

Locus: unknown 

Insert length: lbHS bp 

Poly A stretch at pos. IbEtn polyadenylation signal at pos- IbOS 

30 

1 GGATTTTCCT TGGTCTTAAG ATGGGTAGAA ATGTGATGCG ACACATGTCT 
SI GATGACTTAG GAAGTTATGT TTCTCTTTCG TGTGATGACT TTTCTTCACA 
1D1 GGAATTAGAG ATTTTCATTT GCTCCTTTTC CTCCTCCTGG CTTCAAATGT 
35 1S1 TTGTTGCAGA GGCAGTCTTT AAAAAGTTGT GTCTACAGAG CTCTGGCAGT 
SD1 GTTTCTTCTG AGCCACTCTC TCTTCAGAAA ATGGTATATT CCTATTTACC 
251 AGCCTTGGGG AAAACTGGTG TGCTTGGGTC TGGAAAGATT CAGGTGTCAA 
3D1 AGAAAATAGG ACAGCGGCCT TGTTTTGACT CTCAGAGAAC CTTACTAATG 
351 CTGAATGGTA CTAAACAAAA ACAAGTCGAA GGGCTGCCAG AGTTACTAGA 
40 101 CCTGAACCTT GCTAAATGTT CCTCATCATT AAAAAAATTG AAAAAGAAGT 
i»51 CAGAAGGAGA ATTGTCATGT TCCAAGGAGA ATTGCCCCTC TGTAGTTAAA 
SD1 AAGATGAATT TTCACAAGAC TAATCTAAAA GGAGAAACAG CCCTGCATAG 
551 AGCTTGCATA AATAACCAAG TGGAGAAATT GATTCTTCTT CTCTCTTTGC 
bOl CAGGAATAGA CATCAATGTT AAAGACAATG CTGGCTGGAC GCCTTTGCAT 
45 b51 GAAGCCTGTA ACTATGGCAA CACAGTGTGT GTCCAGGAAA TTTTGCAACG 
701 TTGTCCAGAG GTAGATCTGC TCACTCAAGT GGACGGGGTG ACTCCTTTGC 
751 ATGATGCACT GTCAAACGGA CATGTAGAAA TTGGCAAGCT GCTACTACAG 
aOl CATGGGGGCC CAGTGCTTTT ACAACAGAGG AATGCTAAGG GAGAATTGCC 
fi51 CTTGGATTAT GTGGTTTCAC CTCAAATCAA AGAAGAACTG TTTGCTATTA 
50 101 CAAAAATAGA AGATACAGTG GAGAACTTTC ATGCACAAGC AGAGAAACAT 
=151 TTTCATTACC AGCAACTTGA ATTTGGCTCC TTTTTACTTA GTAGGATGTT 
1001 GCTAAATTTT TGTTCAATTT TTGATTTATC TTCAGAGTTC ATTTTAGCTT 
1DS1 CCAAAGGGTT AACTCATCTA AATGAACTGC TTATGGCTTG TAAAAGTCAT 
1101 AAAGAAACCA CCAGTGTTCA TACTGACTGG TTACTGGATC TTTATGCTGG 
55 1151 AAATATAAAG ACATTGCAGA AACTCCCACA CATTCTTAAG GAACTGCCTG 
1201 AGAATTTGAA AGTGTGTCCT GGGGTACACA CTGAGGCCTT GATGATAACA 
1251 TTGGAAATGA TGTGTCGGTC AGTCATGGAG TTTTCATGAT GATGCTAGAA 
13D1 AGTATGGATT GACTTTCTAA ATCTGTTCAG TTTGCATTGG TACTTACTGT 
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1351 GGACTTCATA GCTTACTGAC AGATAGTAAT TTGATTTATT TATTGACAGA 

mOl CTTTGCAGCC TTGCTAAATT TTAAAAGCAT TTTTAAAAAA ACTTCTACAA 

mSl AACTCTAGTA TGGGCTTCTG ACTTTTTCCA GGGTGTAGAA TTTGACTCAA 

15D1 AAGTAAAAAT AATTTTGTTT TAGTATATTC TACTTTCATT AATGTTTTTT 

1551 TGTTCTGAAA GTGATATTAT ATTGTACATG TAAAATTAAT TTAAATATTT 

IbOl TTTCAAATAA AAATGTAATG TCCTGTAAAA AAAAAAAAAA AAAAA 



BLAST Results 



No BLAST result 
15 Medline entries 



No Medline entry 



Peptide information for frame 3 



25 ORF from El bp to 12fib bp} peptide length: 122 
Category: similarity to known protein 
Classification: Cell signaling/communication 

1 MGRNVMRHMS DDLGSYVSLS CDDFSStSELE IFICSFSSSU LfiPIFVAEAVF 

30 51 KKLCLflSSGS VSSEPLSLflK MVYSYLPALG KTGVLGSGKI flVSKKIGURP 

101 CFDSflRTLLM LNGTKUKflVE GLPELLDLNL AKCSSSLKKL KKKSEGELSC 

151 SKENCPSVVK KMNFHKTNLK GETALHRACI NNfiVEKLILL LSLPGIDINV 

201 KDNAGWTPLH EACNYGNTVC V<3EILi2RCPE VDLLTflVDGV TPLHDALSNG 

251 HVEIGKLLLfl HGGPVLLflflR NAKGELPLDY VVSPfllKEEL FAITKIEDTV 

35 301 ENFHAflAEKH FHYflflLEFGS FLLSRHLLNF CSIFDLSSEF ILASKGLTHL 

351 NELLHACKSH KETTSVHTDU LLDLYAGNIK TLflKLPHILK ELPENLKVCP 

MOl GVHTEALMIT LEIWCRSVIIE FS 



40 

BLASTP hits 

No BLASTP hits available 

45 Alert BLASTP hits for DKFZphamy2_lcl2-, frame 3 

PIR: A5ti l l2T I-kappa-B-related protein - humann N = li Score = 2M2i 
P = 

4-be-lfl 

50 

TREMBLNEbl: AFD3flOM2_l gene: "BARDl":, product: "BRCAl-associated 
RING 

domain protein"} Homo sapiens BRCAl-associated RING domain 
protein 

55 (BARDl) genei exons 10i 11 and complete cds-n N = li Score = 23b-. 
P = 

b.le-17 
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>PIR:ASb421 I-kappa-B-related protein - human 
Length = Mfil 

5 HSPs: 

Score = 242 (3b-3 bits)i Expect = 4-be-lfli P = 4-be-lfl 
Identities = 52/llfi (44X)-, Positives = 71/llfi <b0*) 

10 fluery: 15b 

PSVVKKMNFHKTNLKGETALHRACINNtJVEKLILLLSLPGIDINVKDNAStlTPLHEACNY BIS 
P K +++ N GET LHRACI (2+ ++ L+ G +N +D 

GldTPLHEACNY 

Sbjct: 354 PGAAKGSKUNRRNDtlGETLLHRACIEGfiLRRVCJDLVR- 
15 (JGHPLNPRDYCGUTPLHEACNY 412 

Query: 51b GNTVCVOEILflRCPEVDLL — 
TflVDGVTPLHDALSNGHVEIGKLLLflHGGPVLLflCJRNA 272 

G+ V+ +L VD +G+TPLHDAL+ GH E+ +LLL+ G V 

20 L+ R A 

Sbjct: 413 

GHLEIVRFLLDHGAAVDDPGGflGCEGITPLHDALNCGHFEVAELLLERGASVTLRTRKA 471 



25 



Pedant information for DKFZphamy2_lcl2-» frame 3 



30 



35 



40 



45 



50 



55 



Report for DKFZphamy2_lcl2- 3 



ELENGTH3 422 

EniO 47071. Ifi 

EpI3 b-S7 

EHOMOLO PIR:A5b42 c l I-kappa-B-related protein - human Se-ll 



ES. cerevisiae-i YIL112w3 
IS. cerevisiae-» 



EFUNCATH =n unclassified proteins 
3e-ll 

EFUNCAT]] Db- 13.01 cytoplasmic degradation 

YGR232wJ 4e-0b 

EFUNCAT]] 3D- ID nuclear organization ES- cerevisiae-i YIRQ33wI 
2e-04 

EFUNCAT3 D4-05. 01-07 chromatin modification ES- cerevisiae-i 

YIRD33w3 2e-04 

ESCOPJ dlawcb_ 1.11.3.1.2 GA binding protein (GABP) alpha 
GA bindini be-24 

EECJ 3. 1.3. S3 Myosin-light-chain-phosphatase Te-Ob 

EPIRKIO phosphotransferase 3e-D7 

EPIRKIO tandem repeat 1e-0b 

EPIRKIO transmembrane protein 7e-10 

EPIRKIO serine/threonine-specif ic protein kinase 3e-D7 

EPIRKIO phosphoprotein 3e-10 

EPIRKIO integrin binding 3e-07 

EPIRKIO alternative splicing 3e-ll 

EPIRKIO peripheral membrane protein 2e-D1 

EPIRKIO transcription regulation 3e-0b 

EPIRKIO phosphoric monoester hydrolase 1e-0b 

EPIRKIO cytoskeleton 4e-10 

EPIRKLO smooth muscle Te-Ob 
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ISUPFArO ankyrin 3e-ll 

ESUPFAfD ankyrin repeat homology 3e-ll 

CSUPFAPU unassigned ankyrin repeat proteins 7e-10 
EPFAIU Ank repeat 

5 EKU1 Irregular 

IKIO 3D 

CKU3 L0U_C0I1PLEXITY fi-S3 V. 

10 SE(2 PlGRNVMRHI1SD]>LGSYVSLSCDDFSS(2ELEIFICSFSSSlilL(3HFVAEAVFKKLCLi3SSGS 

SEG xxxxxx 

lawcB 



15 SEfl VSSEPLSLUKf1VYSYLPALGKTGVLGSGKIflVSKKIGflRPCFDSl2RTLLnLNGTtClJK(2VE 

SEG xxxxxxxx 

lawcB 



20 SEfl GLPELLDLNLAKCSSSLKICLKKKSEGELSCSKENCPSVVKKMNFHKTNLKGETALHRACI 

SEG xxxxxxxxxxxxxxxxxxxxxx 

laucB 



25 SEfi NNflVEKLILLLSLPGIDINVKDNAGUTPLHEACNYGNTVCVcSEILflRCPEVDLLTflVDGV 

SEG 

lawcB 

TTTTCCHHHHHHHHCCHHHHHHHHHCCCTTTTCTTTTC- 

30 SEfl TPLHDALSNGHVEIGKLLL(2HGGPVLLl3l3RNAKGELPLDYVVSP(2IKEELFAITKIEDTV 

SEG 

laucB 

CHHHHHHHHTTHHHHHHHHHCCCTT 

35 SE<2 ENFHA<2AEKHFHYflfiLEFGSFLLSRNLLNFCSIFDLSSEFILASKGLTHLNELLnACKSH 

SEG 

lawcB 

40 SE<2 KETTSVHTDULLDLYAGNIKTLOKLPHILKELPENLKVCPGVHTEALIIITLEIinCRSVIIE 

SEG 

laucB 



45 SE(3 FS 
SEG . . 
lawcB 



50 (No Prosite data available for DKFZphamy2_lc]i2 - 3) 

Pfam for DKFZphamy2_lcl2.3 



55 



HMM_NANE Ank repeat 

WIN *GyTPLHIAARyNNvEI1Vr ILLflH • GADIN* 
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G+T+LH A+++N+VE LLL+ G DIN 
duery 171 GETALHRACINNtiJVEKLILLLSLPGIDIN m 

(bits) f: EOS t: E3E Target: dkf zphamyE_lclE.3 
5 similarity to I-kappa-B-related protein 
Alignment to HUM consensus: 
fluery *GyTPLHIAARyNNvEI1VrlLLflHGADIN* 

G+TPLH A+ Y+N + +V+ L(2+ + ++ 
dkfzphamyE EOS GUTPLHEACNYGNTVCVflEILflRCPEVD E3S 

10 

fiuery f: S3T t: Ebb Target: dkf zphamyE_lclE .3 

similarity to I-kappa-B-related protein 

Alignment to HUM consensus: 
HUM *GyTPLHIAARyNNvEMVrlLLfiHGADIN* 
15 G TPLH A +++VE+ +LLL(3HG + 

fluery E3T GVTPLHDALSNGHVEIGKLLLflHGGPVL Ebb 

DKFZphamyE_lil 



20 

group: nucleic acid management 

DKFZphamyS_lil encodes a novel bET amino acidprotein with 
25 similarity to the murine hemin-sensitive initiation factor E- 

The hemin-sensitive initiation factor E is expressed 
predominantly in liveri spleen-i colon and uterus and contains E 
protein kinase motifs- The mouse homologue inhibits protein 

30 synthesis in stress conditions by phosphorylation of eif-E-alpha- 
Four different elFEalpha kinases have been identified in 
mammalian cellsi the heme-regulated inhibitor (HRI)-i the 
interferon-inducible RNA-dependent kinase (PKR)t the endoplasmic 
reticulum-resident kinase (PERK) and MGCNE • The new protein 

35 represents a new member of this family 

The new protein can find application in modulating/blocking of 
translation. 

40 

similarity to hemin-sensitive initiation factor E (Mus musculus)i 
complete cds. alpha kinase 

complete cds- 
45 probably complete in genomic clone DJOOMEflDE 

Sequenced by MediGenomix 

Locus: /map= n 37-2 cR from top of Chr7 linkage group" 

50 

Insert length: Efib3 bp 

Poly A stretch at pos- EflHH-i polyadenylation signal at pos- EflE 1 ! 



55 1 GCAGTGCTGG GCTGGCCGGC GGGCTGGGCT GCGGCCCGCG CGCGGCCGGC 

51 GATGCAGGGG GGCAACTCCG GGGTCCGCAA GCGCGAAGAG GAGGGCGACG 

101 GGGCTGGGGC TGTGGCTGCG CCGCCGGCCA TCGACTTTCC CGCCGAGGGC 

1S1 CCGGACCCCG AATATGACGA ATCTGATGTT CCAGCAGAAA TCCAGGTGTT 
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201 AAAAGA ACCC CTACAACAGC CAACCTTCCC TTTTGCAGTT GCAAACCAAC 
551 TCTTGCTGGT TTCTTTGCTG GAGCACTTGA GCCACGTGCA TGAACCAAAC 
301 CCACTTCGTT CAAGACAGGT GTTTAAGCTA CTTTGCCAGA CGTTTATCAA 
351 AATGGGGCTG CTGTCTTCTT TCACTTGTAG TGACGAGTTT AGCTCATTGA 
5 4D1 GACTACATCA CAACAGAGCT ATTACTCACT TAATGAGGTC TGCTAAAGAG 
M51 AGAGTTCGTC AGGATCCTTG TGAGGATATT TCTCGTATCC AGAAAATCAG 
SD1 ATCAAGGGAA GTAGCCTTGG AAGCACAAAC TTCACGTTAC TTAAATGAAT 
551 TTGAAGAACT TGCCATCTTA GGAAAAGGTG GATACGGAAG AGTATACAAG 
bOl GTCAGGAATA AATTAGATGG TCAGTATTAT GCAATAAAAA AAATCCTGAT 
10 L.S1 TAAGGGTGCA ACTAAAACAG TTTGCATGAA GGTCCTACGG GAAGTGAAGG 
701 TGCTGGCAGG TCTTCAGCAC CCCAATATTG TTGGCTATCA CACCGCGTGG 
751 ATAGAACATG TTCATGTGAT TCAGCCACGA GACAGAGCTG CCATTGAGTT 
801 GCCATCTCTG GAAGTGCTCT CCGACCAGGA AGAGGACAGA GAGCAATGTG 
651 GTGTTAAAAA TGATGAAAGT AGCAGCTCAT CCATTATCTT TGCTGAGCCC 
15 101 ACCCCAGAAA AAGAAAAACG CTTTGGAGAA TCTGACACTG AAAATCAGAA 
151 TAACAAGTCG GTGAAGTACA CCACCAATTT AGTCATAAGA GAATCTGGTG 
1D01 AACTTGAGTC GACCCTGGAG CTCCAGGAAA ATGGCTTGGC TGGTTTGTCT 
1D51 GCCAGTTCAA TTGTGGAACA GCAGCTGCCA CTCAGGCGTA ATTCCCACCT 
1101 AGAGGAGAGT TTCACATCCA CCGAAGAATC TTCCGAAGAA AATGTCAACT 
20 1151 TTTTGGGTCA GACAGAGGCA CAGTACCACC TGATGCTGCA CATCCAGATG 
1201 CAGCTGTGTG AGCTCTCGCT GTGGGATTGG ATAGTCGAGA GAAACAAGCG 
1251 GGGCCGGGAG TATGTGGACG AGTCTGCCTG TCCTTATGTT ATGGCCAATG 
1301 TTGCAACAAA AATTTTTCAA GAATTGGTAG AAGGTGTGTT TTACATACAT 
1351 AACATGGGAA TTGTGCACCG AGATCTGAAG CCAAGAAATA TTTTTCTTCA 
25 1401 TGGCCCTGAT CAGCAAGTAA AAATAGGAGA CTTTGGTCTG GCCTGCACAG 
1M51 ACATCCTACA GAAGAACACA GACTGGACCA ACAGAAACGG GAAGAGAACA 
1501 CCAACACATA CGTCCAGAGT GGGTACTTGT CTGTACGCTT CACCCGAACA 
1551 GTTGGAAGGA TCTGAGTATG ATGCCAAGTC AGATATGTAC AGCTTGGGTG 
lfe.01 TGGTCCTGCT AGAGCTCTTT CAGCCGTTTG GAACAGAAAT GGAGCGAGCA 
30 1L.51 GAAGTTCTAA CAGGTTTAAG AACTGGTCAG TTGCCGGAAT CCCTCCGTAA 
1701 AAGGTGTCCA GTGCAAGCCA AGTATATCCA GCACTTAACG AGAAGGAACT 
1751 CATCGCAGAG ACCATCTGCC ATTCAGCTGC TGCAGAGTGA ACTTTTCCAA 
IflOl AATTCTGGAA ATGTTAACCT CACCCTACAG ATGAAGATAA TAGAGCAAGA 
1A51 AAAAGAAATT GCAGAACTAA AGAAGCAGCT AAACCTCCTT TCTCAAGACA 
35 1101 AAGGGGTGAG GGATGACGGA AAGGATGGGG GCGTGGGATG AAAGTGGACT 
1=151 TAACTTTTAA GGTAGTTAAC TGGAATGTAA ATTTTTAATC TTTATTAGGG 
2001 TATAGTTGGT ACAATGCTTC GTTGTATTTA GTAAGCCTTT ACAAGACTTG 
2051 TTAAAGATGT CAGAGTGCCC CAAGCTGCCG TTCCTTCCCT TCCTGCCCCA 
2101 CAAGCTCCTT TTCCTGAATT TCCTACCTAA ATATTAACCA TATGCCTAGT 
40 2151 CTCTGAAACT AAAAACTTGG ACCTCATCCT CAATTATTTT CTCCTTTCAA 
2201 CTCTGTTGAC CCTCTGTCTG GTCTTCCTCT AGAAGGTTCT ACCGCAGAAA 
2251 TTGATGTGTG CTCCCTGCCC TCGTCACTGC CCAAGCCCGG GCCTGCACAT 
2301 ACTCACTGGA CTGTTCCAGT TTTGACAGCT GCCAGTCTTC CTGCCCCTTT 
2351 CACACTGCAG CTGAAGTTCA TTACCTGAAG GACGCCTCAT CATTTCATTC 
45 2H01 CTTGGCTCCA AACCTTCTGC TGCCTCTAAG ATAAAAGCTC AACTTCTTAA 
2MS1 CAGTGTACAG TGTGCAACTT CCAACCTTTT TATCTGTTCT CTCCACCTTC 
2501 AGTTTAGCGT CATTCCAAAA CCACACCCTT GCAAAGCTTT GTACTCCGCA 
2551 CCCCAGATGA TCTCCAGGCA GCTCAGATCT CTTTCCTGCC TTTGCCCTGC 
2b01 ACTGTTCCCC GGTACTTCCT CCTTTATTGT AGCACTCAGC TCCCCAGCCA 
50 2b51 ATCTGTACAT CCCTCAGAGG CAGCGATCTG ATGAATTGGT TTTTGAATCC 
2701 CAGAAAGGGT CT6CCATGGA GTTGGCAGTC ATCACGGTAG ATGGCGTATG 
2751 ATTTTGCTGA ATTTTAAATA AAATGAAAAC CATAAATTAC ATGATGCTTT 
2801 TATTGACACT TGACAACTGG CCTAAATAAA AAGACTCTGA CTCCAAAAAA 
2B51 AAAAAAAAAA AAA 

55 

BLAST Results 
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Entry AFOEflfiOA from database EflBL : 

Mus musculus hemin-sensitive initiation factor 2 alpha kinase 
mRNA-i 
5 complete cds- 

Score = bbfifl-. P = 2.7e-21b-, identities = 1122/2S3M 

Entry AC005115 from database EHBL : 

Homo sapiens clone DJ0042M02-, WORKING DRAFT SEQUENCE-. 13 
10 unordered 
pieces • 

Score = 511b-. P = O-Oe+OO-. identities = 1010/1146 



15 

Medline entries 



20 Berlanga J.J-i Herrero S.-> de Haro C-i Characterization of the 
hemin-sensitive eukaryotic initiation factor 2alpha kinase from 
mouse 

nonerythroid cellsi J. Biol- Chem- 273<4fl) :32340-3234fc,(111fl) - 

25 



Peptide information for frame 1 



30 

0RF from 52 bp to 113fl bpi peptide length: b21 
Category: similarity to known protein 
Classification: Protein management 
Prosite motifs: PR0TEIN_KINASE_ATP Cl?3-11b) 
35 PR0TEIN_KINASE_ATP (173-117) 
PR0TEIN_KINASE_ST (M37-Mm) 



1 NflGGNSGVRK REEEGDGAGA 

40 51 KEPLflflPTFP FAVANflLLLV 

101 MGLLSSFTCS DEFSSLRLHH 

151 SREVALEAtfT SRYLNEFEEL 

201 KGATKTVCHK VLREVKVLAG 

251 PSLEVLSDflE EDREflCGVKN 

45 301 NKSVKYTTNL VIRESGELES 

351 EESFTSTEES SEENVNFLGfl 

401 GREYVDESAC PYVMANVATK 

451 GPIflflVKIGD FGLACTDILfl 

501 LEGSEYDAKS DMYSLGVVLL 

50 SSI RCPV<2AKYI(J HLTRRNSSdJR 

bOl KEIAELKKflL NLLSflDKGVR 



55 

No BLASTP hits available 



VAAPPAIDFP AEGPDPEYDE SDVPAEIflVL 

SLLEHLSHVH EPNPLRSRflV FKLLCflTFIK 

NRAITHLI1RS AKERVRflDPC EDISRIflKIR 

AILGKGGYGR VYKVRNKLDG (3YYAIKKILI 

LiJHPNIVGYH TAWIEHVHVI (3PRDRAAIEL 

DESSSSSIIF AEPTPEKEKR FGESDTENfiN 

TLELflENGLA GLSASSIVEfl flLPLRRNSHL 

TEA<2YHLHLH K2IK2LCELSL UDUIVERNKR 

IFflELVEGVF YIHNHGIVHR DLKPRNIFLH 

KNTDUTNRNG KRTPTHTSRV GTCLYASPEfl 

ELFlJPFGTEil ERAEVLTGLR TGflLPESLRK 

PSAIflLL<2SE LFfiNSGNVNL TLfiMKIIEflE 
DDGKDGGVG 



BLASTP hits 
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Alert BLASTP hits for DKFZphamy2_lili frame 1 

No Alert BLASTP hits found 

Pedant information for DKFZphamy£_lili frame 1 

Report for DKFZphamy5_lil»'l 



CLEN6TH3 bib 

EMIO 7E73A.76- 



CpID 5- flD 

EHOMOLJ SUISSNEU:HRI_I10USE HEME-REGULATED EUKARYOTIC 

15 INITIATION FACTOR EIF-E-ALPHA KINASE (EC E-7-1--) (HEME-REGULATED 
INHIBITOR) (HRI) (HEME-CONTROLLED REPRESSOR) (HCR) (HEPIIN- 
SENSITIVE INITIATION FACTOR-E ALPHA KINASE ) • 0-0 

EFUNCATID 05-07 translational control ICS. cerevisiaei YDR£S3cJ 
£e-13 

20 EFUNCATJ 30-03 organization of cytoplasm IS- cerevisiae-i 
YDREfl3c]l Ee-13 

EFUNCATJ 10-0E-11 key kinases IS- cerevisiae-i Y0RE31w3 Be-11 

EFUNCATJ 03- DM budding, cell polarity and filament formation 
IS- cerevisiaei Y0R231w]] 6e-11 
25 EFUNCAT3 03-01 cell growth IS- cerevisiaei Y0RS31wJ fle-11 

EFUNCAT3 11-01 stress response IS- cerevisiaei Y0RE31w3 Se-11 
EFUNCAT J 03-EE cell cycle control and mitosis ICS- cerevisiaei 
Y0RE31UO Be-m 

EFUNCAT3 30-10 nuclear organization IS- cerevisiaei YKLlDlwJ 
30 Be-12 

EFUNCAT3 T=i unclassified proteins ES- cerevisiaei YPLlSOuJ 

fle-lE 

EFUNCATJ 03-13 meiosis IS- cerevisiae-i YDR5E3cJ Ee-11 
EFUNCAT3 03-10 sporulation and germination ES- cerevisiaei 
35 YDR523c3 2e-ll 

IFUNCAT3 OT-Ol biogenesis of cell wall ES- cerevisiaei 

YPLHOcJ 1e-ll 

EFUNCATJ 10-03-11 key kinases IS- cerevisiaei YCR073cJ le-11 

EFUNCATJ Tfl classification not yet clear-cut CS- cerevisiaei 

40 YHROAEcJ le-10 

EFUNCATJ 03-07 pheromone response! mating-type determination! 
sex-specific proteins ES- cerevisiaei YLR3bEwJ Ee-10 
EFUNCATID 10-05-11 key kinases ES - cerevisiaei YLR3bEw3 Ee-10 

EFUNCATJ 10-01-11 key kinases ES- cerevisiaei YLR3b2w3 Ee-10 

45 IFUNCAT3 lO-ll other signal-transduction activities ES- 
cerevisiaei YDLlOlcl 3e-10 

EFUNCAT3 11-01 dna repair (direct repairi base excision repair 
and nucleotide excision repair) ES- cerevisiaei YDLlOlcl 3e-lD 
EFUNCATJ 03-ES cytokinesis ES- cerevisiaei YDR507cJ 3e-lD 
50 EFUNCATJ 01-05-01-01 general transcription activities ES- 
cerevisiaei YDLlQfiwl le-OI 

EFUNCATJ 03-lb dna synthesis and replication ES- cerevisiaei 
YBRlbOwH le-OT 

EFUNCATJ 01-05-01 regulation of carbohydrate utilization ES- 
55 cerevisiaei YLR113wl 1e-0=5 

EFUNCATJ 0E-1T metabolism of energy reserves (glycogen! 
trehalose) ES- cerevisiaei YPL031c3 le-Ofl 
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EFUNCATJ DM • D5 - Dl • DM transcriptional control ES- cerevisiae-i 

YPLQ31cJ le-Ofi 

CFUNCATll Dl.tm.04 regulation of phosphate utilization ES- 
cerevisiae-i YPLQ31c3 le-Dfl 
5 EFUNCAT3 c energy conversion EM- genitalium-, (IGimi Ee-Ofl 

EFUNCATJ 03. M recombination and dna repair ES- cerevisiae-i 

Y0R351cJ le-D7 

CFUNCATll D3-EE.Q1 cell cycle check point proteins IS • 

cerevisiae-i YPLlS3c3 le-07 
10 EFUNCATJ ID-OS-M regulation of g-protein activity ES- 
cerevisiaei YBLOlbuO 7e-07 

CFUNCATll 0M. 03-^ other trna-transcr iption activities ES- 
cerevisiaei YILD3Sc]l le-Ob 

CFUNCATll 06.13 vacuolar transport ES. cerevisiae-i YGLlSQwl 

15 le-Ob 

CFUNCATll Dt..l3.[]i4 lysosomal and vacuolar degradation ES- 
cerevisiae-i YGLlfldw]! le-Ob 

EFUNCATJ DM.TT other transcription activities IS • cerevisiae-i 
YERlE^wJ 2e-0b 

20 CFUNCATJ 30-02 organization of plasma membrane IS. cerevisiaei 
YDR12Ew3 £e-0b 

EFUNCATJ 30-D7 organization of endoplasmatic reticulum ES. 
cerevisiae-, YHRD7Tc3 3e-0b 

CFUNCATll Ql.Qb-lQ regulation of lipids fatty-acid and sterol 
25 biosynthesis CS. cerevisiaei YHRD7Tc3 3e-Db 

EFUNCAT3 DS-TT other intracel lular-transport activities ES- 
cerevisiae-, YKLnficH le-05 

EFUNCAT3 lO-OH-^ other nutritional-response activities ES. 
cerevisiae-. YKLnflcJ le-DS 
30 EFUNCAT3 biogenesis of cytoskeleton CS. cerevisiae-i 

YNLDBOcJ <1e-05 

CFUNCATll Ob - □? protein modification (glycolsylationi acylation-i 
myristylationn palmitylation-i f arnesylation and processing) 
ES- cerevisiaei YFLD33c3 Me-OM 

35 CFUNCATll Ol-OE-OM regulation of nitrogen and sulphur utilization 
ES - cerevisiae-, YNLlB3clI 7e-DH 
EBL0CKS3 BL0D107A Protein kinases ATP-binding region proteins 
ESC0PJ dlir3a_ 5-1-1. E-b insulin receptor Complex 

( trans f erase/substrate) le-EB 

40 ESC0P3 dlfgkb_ 5.1.1-E.5 Fibroblast growth factor 

receptor 1 Ehuman (Horn Te-27 

ESC0P3 dlphk S-l-l-l-b gamma-subunit of glycogen 

phosphorylase kinas Ee-E3 

ESC0PJ dlabo 5. 1.1-1. IN Protein kiase CKE-, alpha 

45 subunit EMaize (Ze le-E3 

ESC0PJ d31ck S-l-l-E-E Lymphocyte kinase (Ick) EHuman 

(Homo sapiens) 3e-EE 

ESC0PJ dEerk 5.1.1.1.11 MAP kinase ErkS Erat (Rattus 

norvegicus) 7e-ED 
50 ESC0P3 dlcdkb_ 5. 1-1-1. 2 cAMP-dependent PK-. catalytic 

subunit Comple be-n 

ESC0P1 dlhcl 5.1-1.1.1 Cyclin-dependent PK EHuman 

(Homo sapiens) 5e-El 

EEC3 2-7-1. HE Protein-tyrosine kinase le-0a 

55 EEC3 2-7-1. lEb beta-Adrenergic-receptor kinase Ee-06 

EEC3 2-7-1.117 Myosin-light-chain kinase le-QT 

EEC! 2-7-1-37 Protein kinase 5e-12 
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EEC]) E 


•7-1-1E3 CaE+/calmodul in-dependent orotein kinase Me- 










EPIRKIO 


phosphotransferase 




CPIRKUHl 


nucleus Te-DT 


5 


EPIRKUjD 


RNA binding Ee-El 




EPIRKIjO 


duplication fle-10 




EPIRKIO 


tandem repeat Me-DT 




EPIRKIO 


zinc 5e-lE 




EPIRKIO 


cell cycle control Ee-OI 


10 


EPIRKIO 


ser ine/threonine-speci f ic protein kinase D-D 




EPIRKIO 


transmembrane protein Ee-DT 




EPIRKIO 


zinc finger fle-10 




CPIRKU3 


oncogene be-lE 




EPIRKIO 


autoDhosDhor vl at i on D-0 


15 


EPIRKIO 


coat protein le-11 




EPIRKIO 


magnesium Te-DI 




EPIRKIO 


ATP D-D 




EPIRKIO 


polyprotein be-lE 




EPIRKIO 


receptor Te-m 


20 


EPIRKIO 


phosphoprotein D»D 




EPIRKIO 


sporulation Ee-OT 




EPIRKIO 


glycoprotein Te-DT 




EPIRKIO 


growth factor receptor Te-11 




EPIRKIO 


signal transduction Ee-IE 


25 


EPIRKIO 


serine/threonine/tyrosine— speci fic protein kinase 




5e-lD 






EPIRKIO 


protein kinase fle-10 




EPIRKIO 


transforming protein Ee-IE 




EPIRKIO 


heme binding D-D 


30 


EPIRKIO 


Durine nucleotide bindina Ee— 10 




EPIRKIO 


calcium binding He-m 




EPIRKIO 


meiosisle~0fl 




EPIRKIO 


alternative splicing le-11 




EPIRKIO 


P-Iood Ee-1D 


35 


EPIRKIO 


proto-oncogene Ee-IE 




EPIRKIO 


segmentation ^Je-lO 




EPIRKIO 


stress-induced protein le-DT 




EPIRKIO 


EF hand Me-DT 




EPIRKIO 


cell division le-DT 


40 


EPIRKIO 


calmodulin binding Me-DT 



ESUPFAPO LIM protein kinase fle-10 

ESUPFAPO calcium-dependent protein kinase He-DT 

ESUPFAPO rat protein kinase raf Se-lS 

ESUPFAPO APIP-activated protein kinase Ee-Dfi 

45 ESUPFAPO protein kinase byrS Se-DT 

ESUPFAPO SHE homology le-Ofl 

ESUPFAPO unassigned Ser/Thr or Tyr-specific protein kinases D.D 

ESUPFAPO leucine-rich alpha-E-glycoprotein repeat homology Te-DT 

50 ESUPFAPO double-stranded RNA-binding repeat homology Ee-El 

ESUPFAPO histidine— tRNA ligase homology be-ME 

ESUPFAPO SAM homology Se-DT 

ESUPFAPO avian retrovirus IC1D gag-Rmil-env polyprotein le-11 

ESUPFAPO LIU metal-binding repeat homology fle-10 

55 ESUPFAPO GCNE protein be-ME 

ESUPFAPO protein kinase homology 0-0 

ESUPFAPO protein kinase C zinc-binding repeat homology Ee-IE 

ESUPFAPO CaE + /calmodulin-dependent protein kinase II Me-Dfl 
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ISUPFAMJ beta-adrenergic-receptor kinase 2e-D8 

ESUPFAnj kinase-related transforming protein be-12 

ESUPFArn protein kinase A-raf 2e-12 

CSUPFAM]) SH3 homology le-Ofi 

5 CSUPFAfU Ca2+/calmodulin-dependent protein kinase 4e-01 

ESUPFAM3 protein kinase XaEl le-QT 

CSUPFAM3 calmodulin repeat homology iJe-DT 

ESUPFAfU protein kinase DUN1 le-QI 

ESUPFArO pleckstrin repeat homology le-OI 

10 ESUPFAM3I protein kinase TIK 2e-21 

ESUPFAMU protein-tyrosine kinase tec le-Ofi 

ESUPFAHH kinase interaction domain homology le-CH 

EPR0SITE3 PROTEIN_KINASE_ATP 2 

EPROSITEJ PROTEIN_KINASE_ST 1 
15 EPFACO Eukaryotic protein kinase domain 

EKkO Irregular 

IE K LI ID 3D 

EKliD L0U_C0MPLEXITY 10- V. 

EKliD C0ILED_C0IL 5- 2b V. 

20 

SEfl AVLGUPAGUAAARARPAtlflGGNSGVRKREEEGDGAGAVAAPPAIDFPAEGPDPEYDESDV 

SEG • - • xxxxxxxxxxxxxx xxxxxxxxxxxxxxx • • - 

COILS 

25 

IjstA 



SEfl PAEIflVLKEPLflflPTFPFAVANflLLLVSLLEHLSHVHEPNPLRSRflVFKLLCflTFIKMGL 

30 SEG xxxxxxxxxxxxxxx 

COILS 



IjstA 



35 

SEfl LSSFTCSDEFSSLRLHHNRAITHLMRSAKERVRflDPCEDISRIflKIRSREVALEAflTSRY 

SEG 

COILS 



40 IjstA 



SEfl LNEFEELAILGKGGYGRVYKVRNKLDGflYYAIKKILIKGATKTVCPIKVLREVKVLAGLflH 

SEG 

45 COILS 



IjstA 

TTTEEEEEECCCBTTBCEEEEEETTTTCEEEEEEECCTTTTTTTTHHHHHHHHHHHTTTB 

50 SEfl PNIVGYHTAUIEHVHVIflPRDRAAIELPSLEVLSDflEEDREflCGVKNDESSSSSIIFAEP 

SEG 

COILS 



IjstA 

55 TTBC 

SEfl TPEKEKRFGESDTENflNNKSVKYTTNLVIRESGELESTLELflENGLAGLSASSIVEflflLP 
SEG 
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IjstA 



SE<2 LRRNSHLEESFTSTEESSEENVNFLGflTEAflYHLHLHIflflflLCELSLIilDliJIVERNKRGRE 

SEG xxxxxxxxxxxxx 

COILS 



10 IjstA 



15 



SE(2 YVDESACPYVNANVATKIFdELVEGVFYIHNMGIVHRDLKPRNIFLHGPDiaiSVKIGDFGL 

SEG 

COILS 



IjstA 



20 SEl2 ACTDILflKNTDUTNRNGKRTPTHTSRVGTCLYASPEtfLEGSEYDAKSDMYSLGVVLLELF 

SEG 

COILS 



25 



IjstA 



30 



SE<2 <2PFGTEMERAEVLTGLRTGflLPESLRKRCPV<3AKYIt2HLTRRNSSflRPSAI<JLL<2SELFfi 

SEG 

COILS 



IjstA 



SE<3 NSGNVNLTLt2MKIIE(2EKEIAELKK(3LNLLSi3DKGVRDDGKDGGVG 

35 SEG xxxxxxxxxxxxxx 

COILS • .CCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCC 



IjstA 



40 



PSDD1D7 
PS0O1D7 
45 PSDOlDfl 



Prosite for DKFZphamyB_lil- 1 

nO->51M PROTEIN_KINASE_ATP PD0C001D0 
n0->215 PROTEIN_KINASE_ATP PD0CDQ1D0 
HSH->Hb? PROTEIN_KINASE_ST PD0CQD1D0 



50 



Pfam for DKFZphamy2_lil • 1 



Hfiri_NAHE Eukaryotic protein kinase domain 
HMM 

55 *YeigRiIGeGsFGtVYkCiUr.TGeIVAIKIIk-krsnis F1REI 

+E + I+G+G++G+VYK++++ +G+ +AIK+I K ++ 

+LRE+ 
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(Juery IflM 

FEELAILGKGGYGRVYKVRNKLDGlJYYAIKKILIKGATKTVCriKVLREV E3E 

HUM qlMRrLnHPNIIRFYDwFedddDHI* 
5 ++++ L+HPNI+ + +++ ++ H+ 

(Juery £33 KVLAGLtJHPNIVGYHTAUI-EHVHV £5b 

hum 

*IYriII1EYrieGGDLFDYIrrng pMsEwelrf IllyfilL 

10 +++ H+++E +L+D+I++++ ++ + + +1+ 

+++ 

(Juery 3^1= LHK2IK3LCEL- 

SLUDUIVERNKRGREYVDESACPYVMANVATKIFflELV HM3 

15 HUM 

pGHeYLHSMgllHRDLKPENILIDeN.gqlKIcDFGLARqMn 

+G+ Y+H+I1GI+HRDLKP+NI++ + (J+KI+DFGLA + 

(Juery MMM 

EGVFYIHNnGIVHRDLKPRNIFLHGPDflflVKIGDFGLACTDILflKNTDUT 1^3 

20 

hum 

nYernttfCGTPWYnnAPEVIImgnyYttkVDNUSFGCILUEMriT 

+ T+++GT y +PE ++G++Y+ K+DM+S+G++L 

E+ + 

25 (Juery M*\H NRNGKRTPTHTSRVGTCLYA-SPEfl- 

LEGSEYDAKSDMYSLGVVLLELF- 510 

HMM 

GepPFyd. . dnMemlmrliqr - f rrpf UpnCSeElyDFMrwCUnyDPekR 
30 +PF ++ E + ++ + ++ ++ +c+ +++ + + +++ 

++R 

(Juery SMI — <JPFGTEP1ERAEVLTGLRTG<JLPESLRKRCPV(JAKYI(J- 

HLTRRNSSdR 5fl7 

35 HUM PTFriJILnHPUF* 

P++ (J+L++ F 
(Juery Sfla PSAIiJLLflSELF 5*n 
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5 group: transmembrane proteins 

DKFZphamyB_lim encodes a novel bl7 amino acid protein with 
similarity to the human l(3)mbt protein homolog. 

10 Mutations of the Drosophila l(3)mbt gene lead to malignant brain 
tumors- The novel protein contains 1 transmembrane domain. 
No informative BLAST resultsi No predictive prosite-i pfam or SCOP 
motif e 

15 The new protein can find application in studying the expression 
profile of oncogenes and amgydala-specif ic genes and as a new 
marker for amygdala cells. 



20 similarity to Human l(3)mbt protein homolog mRNA 

> 1M exons (HS75L.G23 (EMBLNEti)) ) 
Pedant: TRANSMEMBRANE 1 

25 Sequenced by MediGenomix 

Locus: /map= n S2ql3. 31-13- 33" 

Insert length: 3071 bp 
30 Poly A stretch at pos. 3D52-I no polyadenylation signal found 



1 GGCAGGCCAA TATGGCTTCC TGCACCTGGT GACGCTTGGC GAAACTGAGG 

SI TCTCATGGAG AAGCCCCGGA GTATTGAGGA GACCCCATCT TCAGAACCAA 

35 101 TGGAGGAAGA GGAAGATGAC GACTTGGAGC TGTTTGGTGG CTATGATAGT 

151 TTCCGGAGTT ATAACAGCAG TGTGGGCAGT GAGAGCAGCT CCTATCTGGA 

2D1 GGAGTCAAGT GAAGCAGAAA ATGAGGATCG GGAAGCAGGG GAACTGCCGA 

251 CCTCCCCGCT GCATTTGCTC AGCCCTGGGA CTCCTCGCTC CTTGGATGGC 

3D1 AGTGGTTCTG AGCCAGCTGT CTGTGAGATG TGTGGTATCG TGGGTACAAG 

40 351 GGAAGCCTTC TTCTCCAAGA CCAAGAGGTT CTGCAGCGTC TCCTGCTCCA 

H01 GGAGCTACTC CTCCAACTCC AAGAAAGCCA GTATCTTGGC TAGATTACAG 

451 GGAAAACCAC CGACCAAAAA AGCCAAA6TC CTGCACAAGG CTGCCTGGTC 

5Q1 TGCCAAAATT GGAGCCTTCC TCCACTCTCA AGGGACAGGA CAGCTGGCAG 

551 ATGGGACACC AACAGGACAA GACGCTCTGG TCTTGGGCTT CGACTGGGGG 

45 bOl AAGTTCCTGA AGGATCACAG TTACAAGGCT GCTCCCGTCA GCTGTTTCAA 

b51 GCACGTCCCA CTCTATGACC AGTGGGAGGA TGTGATGAAA GGGATGAAGG 

701 TGGAGGTGCT CAACAGTGAT GCTGTGCTCC CCAGCCGGGT GTACTGGATC 

751 GCCTCTGTCA TCCAGACAGC AGGGTATCGG GTGCTGCTTC GGTATGAAGG 

flOl CTTTGAAAAT GACGCCAGCC ATGACTTCTG GTGCAACCTG GGAACAGTGG 

50 fl51 ATGTCCACCC CATTGGCTGG TGTGCCATCA ACAGCAAGAT CCTAGTGCCC 

101 CCACGGACCA TCCATGCCAA GTTCACCGAC TGGAAGGGCT ACCTCATGAA 

151 ACGGCTGGTG GGCTCCAGGA CGCTTCCCGT GGATTTCCAC ATCAAGATGG 

1D01 TGGAGAGCAT GAAGTACCCC TTTAGGCAGG 6CATGCGGCT GGAAGTGGTG 

1051 GACAAGTCCC AGGT6TCACG CACTCGCATG GCTGTGGTGG ACACAGTAAT 

55 1101 CGGGGGTCGC CTACGGCTCC TCTACGAGGA TGGTGACAGT GACGACGACT 

1151 TCTGGTGCCA CATGTGGAGC CCCCTGATCC ACCCAGTGGG TTGGTCACGA 

12D1 CGTGTGGGCC ACGGCATCAA GATGTCAGAG AGGCGAAGTG ACATGGCCCA 

1251 TCACCCCACC TTCCGGAAGA TCTACTGTGA TGCCGTTCCT TACCTCTTCA 
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13D1 AGAAGGTACG AGCAGTCTAC 
13S1 AAGCTGGAGG CCATTGACCC 
mOl TGTCTGTAAG GTTCTCCTGG 
1MS1 GGCCCTCCAC AGATGGCTTG 
5 15D1 GCCATCTTCC CGGCCACCTT 
1SS1 GCCAAAAGGT TATGAGGCAC 
IbOl AGACCAAGTC GAAAGCCGCT 
IbSl AACCATGGCT TCAAGGTGGG 
17D1 GCCCCGGCTC ATCTGTGTGG 

10 17S1 TCAGCATCCA CTTTGACGGC 
1501 TGCGAGTCCC CAGACATCTA 
lflSl CCAGCTCCAG CCTCCTGTG6 
1101 GGCTCTGACT TTCTTTCCTC 
1151 CCCATCTCCG TTCTTTGGCA 

15 5001 AGTAGAGAGT GAGCCCCGTC 
B0S1 CCCTTTCCCT CTGGCCTGCA 
5101 CATATGTTCG TGCCCTTGTG 
B1S1 GCAGCCCTGG TAACAAGGGT 
5501 TCCTCCAGCC CCGCCCTCTC 

20 55S1 GGGTGTCTCG TGTGGGAGGG 
5301 TGTGTCCTCC CAGGGACCCT 
53S1 CAACAGAACC GGCCACACCG 
5M01 AAGAAACAGT TTGGGAAGAA 
SMSl ACCCCTCAGA CAGGGGTCCA 

25 5501 GTGCCAGGAA GATCTCGTCG 
5551 CGTGTGAAGG AAGAGCATCT 
SbOl TCCAGAGCTG CCTGTCTCCG 
BbSl GAGCCTTCCT GCCTCCAGCC 
5701 TCTCTACCAC CACCACCATG 

30 5751 CCTCTCTGTG TAAATTCTGC 
5601 CCTGCTGGGG TCTCCTGGGA 
5851 AAAGGTCTAT ATGACGGGCC 
5101 ACCTTTTCCA GCCAGAGTTC 
5151 CCTTAAGATG GCCTCCCCCC 

35 3001 GGGGCCACCA CTGTCACACT 
3051 CTAAAAAAAA AAAAAAAAAA 



ACAGAAGGCG GTTGGTTTGA GGAAGGGATG 
CCTGAATCTG GGCAACATCT GCGTGGCAAC 
ATGGATACCT GATGATCTGT GTGGACGGGG 
GACTGGTTCT GCTACCATGC CTCTTCCCAC 
CTGTCAGAAG AATGACATTG AGCTCACACC 
AGACTTTCAA CTGGGAGAAC TACTTGGAGA 
CCATCGAGAC TCTTTAACAT GGATTGCCCA 
CATGAAGCTG GAGGCCGTGG ACCTGATGGA 
CCACGGTGAA ACGAGTGGTG CATCGGCTCC 
TGGGACAGCG AGTACGACCA GTGGGTGGAC 
CCCCGTCGGC TGGTGTGAGC TCACCGGCTA 
CCGCAGGTGT GGGCTCTCGT GGCCCTAAGA 
TTCTTTTTTC CTTCTTCCCC CGCCCCTGTG 
TGAGGTGGAG ATGTCTCATG GACCACTTTA 
ACCCAGCCCC TGCTCCTGAC TTCTCTGTCT 
GAGCTCCTTC CTTCATCTTG CCCACTCTGT 
CACCCAGGTA AACTACCCAG GTCCCTCTGA 
GSGAAGAAGG GACAGCTGTT CTCCGGCCCC 
CTCATTGCCC AGGTTTGGCT TCCTGTCTTG 
TGGATGGGGT CTCGGGATGC GCCTGTGCCC 
CTTCTCATCT CTTTCACCCT TGTCTTTCAA 
CTGAAGGCCA AAGAGGCCAC AAAGAAGAAA 
AAGGAA AAGA ATCCCGCCCA CTAAGACGCG 
AGAAGCCCCT GCTGGAGGAC GACCCTCAGG 
GAGCCTGTTC CTGGCGAGAT CATTGCTGTG 
AGACGTGGCC TCGCCCGACA AGGCTTCAAG 
TCGAGAACAT CAAGCAGGAA ACAGACGACT 
TGGCTTCTAG CTGGAAGCCA GCCCAGCGTT 
CCTCCACCTG ACTTTGGCTT GGAGACTGAT 
CCGGTGCTGT GAAGGCTGGA CGGTGGAGGA 
CCCGCCTGTT GCTTCTGCCC TCCCCTGTGG 
GCCTGAGGCC CCAGAACTCG TCTGTGAACC 
CCAAAGCTGG AACGCTAGCT GCCTGCTCTT 
GACCCGCCAC GGCCCTCAGT TGCCAGGGAT 
GTGGAATACA AGACAGTGAA CTCTGTCTGC 
A 



BLAST Results 

40 

Entry HS7SbG53 from database EHBLNEW : 

Human DNA sequence from clone 7SbGB3 on chromosome SSql3.31-13.33 
Score = 3131-. P = O-Oe+OO-, identities = A7S/15H 

45 

Entry Ufl135fl_l from database TREI1BL: 

product: "l(3)mbt protein homolog"} Human l(3)mbt protein 
homolog 

mRNAi complete cds- 
50 Score = 50S-. P = 7.5e-M5-. identities = 153/350-1 positives = 
170/3S0-, 
frame +1 

Entry AB0m5fll_l from database TREMBL: 
55 gene: "KIAAObfll"; product: "KIAAOtfll protein"} Homo sapiens 
mRNA for 

KIAAObfil protein-i partial cds. 
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Score = 503, P = 1.4e-4b-. identities = 122/307-, positives 

lb3/30?-, 

frame +1 



Medline entries 



10 No dedline entry 



Peptide information for frame 1 

15 

ORF from 55 bp to 1105 bp; peptide length: bl7 
Category: similarity to known protein 
Classification: unclassified 

20 

1 PIEKPRSIEET PSSEPNEEEE DDDLELFGGY DSFRSYNSSV GSESSSYLEE 

51 SSEAENEDRE AGELPTSPLH LLSPGTPRSL DGSGSEPAVC EHCGIVGTRE 

101 AFFSKTKRFC SVSCSRSYSS NSKKASILAR LflGKPPTKKA KVLHKAAWSA 

151 KIGAFLHStJG TGULABGTPT GtJDALVLGFD UGKFLKDHSY KAAPVSCFKH 

25 201 VPLYDflUEDV HKGMKVEVLN SDAVLPSRVY UIASVIflTAG YRVLLRYEGF 

251 ENDASHDFUC NLGTVDVHPI GUCAINSKIL VPPRTIHAKF TDWKGYLMKR 

301 LVGSRTLPVD FHIKMVESPIK YPFRUGMRLE VVDKSflVSRT RMAVVDTVIG 

351 GRLRLLYEDG DSDDDFWCHI1 WSPLIHPVGtd SRRVGHGIKfl SERRSDMAHH 

401 PTFRKIYCDA VPYLFKKVRA VYTEGGUFEE GHKLEAIDPL NLGNICVATV 

30 451 CKVLLDGYLtl ICVDGGPSTD GLDUFCYHAS SHAIFPATFC (3KN5IELTPP 

501 KGYEAflTFNU ENYLEKTKSK AAPSRLFNPID CPNHGFKVGfl KLEAVDLMEP 

551 RLICVATVKR VVHRLLSIHF DGUDSEYDdW VDCESPDIYP VGUCELTGYU 
bOl LflPPVAAGVG SRGPKRL 

35 



BLASTP hits 

No BLASTP hits available 

40 

Alert BLASTP hits for DKFZphamy2_lil4-, frame 1 

TREHBL : ABD145fll_l gene: "KIAAOhfil"; product: "KIAAObfll protein"; 
Homo 

45 sapiens mRNA for KIAAObfll protein-i partial cds.-i N - li Score = 
503-, P 
= 3.1e-4A 

TREIIBL^a'OSa.J, product: "l(3)mbt protein homolog"; Human 
50 l(3)mbt 

protein homolog mRNA ■> complete cds-i N » li Score = 505i P = 
b.2e-4fl 



55 >TREHBL:Ufl c i35a_l product: n l(3)mbt protein homolog"; Human 
l(3)mbt protein 

homolog mRNA-i complete cds- 
Length = 772 
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HSPs: 

Score = SDS (75. A bits)-. Expect = b-2e-4fli P = b-Ee-MB 
5 Identities = 123/313 (3T/.)-, Positives = 170/313 (54*) 

fluery: 5^3 tilKGYLflKRLVGSRTLPVDFH-- 
IKHVESNKYPFRflGllRL.EVVDKSflv'SRTRIIAVVDTvTG 3SD 

U+ YL ++ + T PV + V K F + GM+LE +D S + 

10 V V G 

Sbjct: 206 WESYLEEflK— 

AITAPVSLFfiDSflAVTHNKNGFKLGMKLEGIDPfiHPSriYFILTVAEVCG Eb5 

fluery: 3S1 GRLRLLYEDGDSD-DDFWCHNblSPLIHPVGIilSRRVGHGIKriSE — 
15 RRSDNAHHPTFRKIY 407 

RLRL + DG S+ DFW + SP IHP GUI + GH +++ + + + + 
Sbjct: Ebb YRLRLHF- 

DGYSECHDFUVNANSPDIHPAGWFEKTGHKL(2LPKGYKEEEFSli)S(2YMCSTR 3S4 

20 fluery: 406 CDAVP- 

YLFKKVRAVYTEGGWFEEGMKLEAIDPLNLGNICVATVCKVLLDGYLMICVDGG 4bb 

A P ++F G F+ GilKLEA+D +N +CVA+V V+ D 

++ D 

Sbjct: 3ES AdAAPKHHFVSflSHSPPPLG-FflVGMKLEAVDRt1NPSLVCVASVTDVV- 
25 DSRFLVHFDNU 3S2 

(3uery: 4b7 PSTDGLDblFCYHASSHAIFPATFCfiKNDIELTPPKGY- 
EAflTFNUENYLEKTKSKAAPSR SES 

T D++C SS I P +C<3K LTPP+ Y + F UE YLE+T 

30 .+ A P+ 

Sbjct: 383 DDT — YDYUC- 

DPSSPYIHPVGUC(2K(JGKPLTPPj2DYPDPDNFCUEKYLEETGASAVPTU 43=1 
(2uery: S2b 

35 LFNnDCPNHGF<VGHKLEAVDLHEPRLICVATVKRVVHRLLSIHFDGIilDSEYD(3UVDCES Sfi5 

F + P H F V MKLEAVD P LI VA+V+ V + IHFDGU 

YD U+D + 

Sbjct: 440 AFKVR- 

PPHSFLVNMKLEAVDRRNPALIRVASVEDVEDHRIKIHFDGUSHGYDFWIDADH 4^ 

40 

fluery: Sflb PDIYPVGUCELTGYflLflPPV bOS 

PDI+P GWC TG+ LAPP+ 
Sbjct: 4<H PDIHPAGWCSKTGHPLflPPL Slfl 

45 Score ■ 333 (50-0 bits)-. Expect = 4-le-27-. P = 4-le-27 
Identities = 103/324 (315D-. Positives = 151/324 <4b50 

fluery: 17T FDUGKFLKDHSYKAAPVSCFKHVPLYDfiUEDVIIK- 
GMKVEVLNSDAVLPSRVYIilIASVIfl E37 
50 + Id +L++ APVS F+ ++ K GPIK+E + D PS 

+Y+I +V + 

Sbjct : 20b USUESYLEEflKAITAPVSLFflDSflAVTHNKNGFKLGHKLEGI — DPflHPS- 
MYFILTVAE SbE 

55 fluery: 238 

TAGYRVLLRYEGFENDASHDFUCNLGTVDVHPIGUCAINSKILVPPRTIHAKFTDliJKGYL ET7 
GYR+ L ++G+ HDFId N + D+HP Gbl L P+ + 

Id Y+ 
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Sbjct: 2b3 VCGYRLRLHFDGYSE-- 

CHDFIdVNANSPDIHPAGUFEKTGHKLflLPKGYKEEEFSUSlJYII 3SQ 

fluery : E^B [IKRLVGSRTLPVDFHIKriVESIIKYP--- 
5 FR<3Gt1RLEVVDKS<3VSRTRMAVVDTVIGGRLR 35*4 

+R H+ + +S P F+ 6n+LE VD+ S +A V 

V+ R 

Sbjct: 321 CS 

TRAfiAAPKHMFVSflSHSPPPLGFflVGMKLEAVDRnNPSLVCVASVTDVVDSRFL 37b 

10 

fiuery: 3SS LLYEDGDSDDDFIdCHflbJSPLIHPVGliJSRRVGHGIKIISERRSD 

MAHHPTFRKIYCDAV 411 

+ +++ D D+WC SP IHPVGU ++ G + + D 

+ AV 
15 Sbjct: 377 

VHFDNWDDTYDYUCDPSSPYIHPVGUCdKdGKPLTPPdDYPDPDNFCUEKYLEETGASAV 43b 



fluery: 415 

PYLFKKVRAVYTEGGUFEEGHKLEAIDPLNLGNICVATVCKVLLDGYLMICVDGGPSTDG 471 
20 P KVR ++ F MKLEA+D N I VA+V V D + I 

DG + G 

Sbjct': 437 PTUAFKVRPPHS FLVNNKLEAVDRRNPALIRVASVEDVE- 

DHRIKIHFDGU — SHG 4flT 



25 fluery: 47S LDUFCYHASSHAIFPATFCflKNDIELTPPKG 502 

J> F A I PA +C K L PP G 
Sbjct: 410 YD-FbllDADHPDIHPAGliKSKTGHPLflPPLG SIT 

Score = 23b (35-4 bits)i Expect = 2-Se-lbi P = 2-Se-lb 
30 Identities = 47/110 C42X)-. Positives = bb/110 (bQ*> 



fluery: 4«n PPKGYEAfiTFNlilENYLEKTKSKAAPSRLF-NflDCPNH — - 
GFKVGHKLEAVDLMEPRLIC SS4 

p q + + ++UE+YLEI+ K+ AP LF + H GFK + GMKLE +D 

35 P + 

Sbjct: 117 

PATGEKKECUSUESYLEE<3KAITAPVSLFflDSflAVTHNKNGFKLGMKLEGIDPflHPSMYF 2Sb 



fluery: 555 VATVKRVVHRLLSIHFDGWDSEYMUVDCESPDIYPVGUCELTGYfiLflPP 
40 bD4 

+ TV V L +HFI>G+ +D UV+ SPDI+P GU E TG++Li2 P 
Sbjct: 257 ILTVAEVCGYRLRLHFDGYSECHDFWVNANSPDIHPAGlJFEKTGHKLflLP 
3Db 



45 

Pedant information for DKFZphamy2_lil4i frame 1 



Report for DKFZphamy2_lil l < .1 

50 

ELENGTHJ bl7 

CMIO b12b4-ll 

EpIJ b.05 

55 EHOriOLI TREMBL:Uai35fl_l product: n l(3)mbt protein 

homolog"n Human l(3)mbt protein homolog mRNA-t complete cds- le-47 

EBL0CKS3 BL012DbA Amilor ide-sensitive sodium channels proteins 
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IKUJ TRANSMEMBRANE 1 

IKIiU LOU_COMPLEXITY L40 V. 



5 SE<3 MEKPRSIEETPSSEPMEEEEDDDLELFGGYDSFRSYNSSVGSESSSYLEESSEAENEDRE 

SEG xxxxxxxxxxxxxxxxxxx xxxxxxxxxxxxxxxxxxxxxxx 

PRD ccccceeeeccccccccccccccccccccccccccccccccccccccccccccccccccc 

MEM 

10 SEA AGELPTSPLHLLSPGTPRSLDGSGSEPAVCEMCGIVGTREAFFSKTKRFCSVSCSRSYSS 

SEG xxxxxxxxxx 

PRD ccccccccccccccccccccccccccceeeeeecccccccccccccccceeeeccccccc 

MEM 

15 SE<2 NSKKASILARL(2GKPPTKKAKVLHKAAUSAKIGAFLHS<3GTG(2LADGTPTG<3DALVLGFD 

SEG xxxxxx 

PRD ccchhhhhhhhhcccccccchhhhhhhhhhhhhhhcccccccccccccccccceeeeecc 

MEM 

20 SE(3 UGKFLKDHSYKAAPVSCFKHVPLYDflUEDVMKGMKVEVLNSDAVLPSRVYWIASVIflTAG 

SEG 

PRD chhhhhhccccccccccccccccccccchhhhhheeeeeccccccceeeehhhhhhhhhc 

MEM 

25 SEfl YRVLLRYEGFENDASHDFUCNLGTVDVHPIGUCAINSKILVPPRTIHAKFTDWKGYLMKR 

SEG 

PRD ceeeeeeccccccccceeeeccccccccccccccccceeeccccccccccccchhhhhhh. 

MEM 

30 SEC LVGSRTLPVDFHIKMVESMKYPFR<2GMRLEVVDKS(3VSRTRMAVVDTVIGGRLRLLYEDG 

SEG 

PRD hccccccccccccccccccccccccccceeeecccccceeeeeeeeeccccceeeeeccc 

MEM 

35 SE(J DSDDDFUCHMUSPLIHPVGUSRRVGHGIKMSERRSDMAHHPTFRKIYCDAVPYLFKKVRA 

SEG 

PRD cccceeeeeccccccccccccccccccccccccccccccchhhhhhcccccccccccccc 

MEM 

40 SE<2 VYTEGGUFEEGMKLEAIDPLNLGNICVATVCKVLLDGYLMICVDGGPSTDGLDUFCYHAS 

SEG 

PRD ccccccchhhhheeeeccccccceeeeeeehhhhhcceeeeeeccccccccceeeeeecc 

MEM MMMMMMMMMMMMMMMMM 

45 SE<2 SHAIFPATFCflKNDIELTPPKGYEAtJTFNUENYLEKTKSKAAPSRLFNMDCPNHGFKVGM 

SEG 

PRD cccccccccccccccccccccccccchhhhhhhhhhhhccccccccccccccchhhhhhe 

MEM 

50 SEfl KLEAVDLMEPRLICVATVKRVVHRLLSIHFDGblDSEYDfllilVDCESPPIYPVGliICELTGYfi 

SEG 

PRD eeecccccccceeeeeehhhhhhhheeeeeccccccccccccccccccccceeeeccccc 

MEM 

55 SE<2 LflPPVAAGVGSRGPKRL 

SEG 

PRD ccccccccccccccccc 

MEM 
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5 group: differentiation/development 

DKFZphamy2_li2M encodes a novel fl35 amino acid protein without 
partial similarity to rattus norvegicus NotchE protein. 

10 Notch family molecules are thought to be negative regulators of 
neuronal differentiation in early brain development. Notch2 is 
expressed not only by neuronal cells in the embryonic brain-i but 
also by glial cells in the postnatal brain- The new protein 
represents a new member of this family and may be involved in 

15 specific dif f erentation or developmental pathways of the nervous 
system. 

The new protein can find application in modulating development 
and differentiation of amygdala cells. 

20 

putative protein 
probably complete cds- 

25 

Sequenced by MediGenomix 

Locus: unknown 

30 Insert length: S7fc.fi bp 

Poly A stretch at pos- 2714n polyadenylation signal at pos. 2fc>17 



1 AGAAATCTTC AGCCAAACAG CTGCAGGAAG TAGAGAAGGT TAAACCCCAG 

35 51 AGTGAGAAAG TTCATCAGAC TCTGATTCTG GACCCAGCAC AGAGGAAGAG 

101 ACTCCAGCAG CAGATGCAGC AGCACGTTCA GCTCTTGACC CAAATCCACC 

151 TTCTTGCCAC CTGCAACCCC AACCTCAATC CGGAGGCCAC TACCACCAGG 

201 ATATTTCTTA AAGAGCTGGG AACCTTTGCT CAAAGCTCCA TCGCCCTTCA 

251 CCATCAGTAC AACCCCAAGT TTCAGACCCT GTTCCAACCC TGTAACTTGA 

40 301 TGGGAGCTAT GCAGCTGATT GAAGACTTCA GCACACATGT CAGCATTGAC 

3S1 TGCAGCCCTC ATAAAACTGT CAAGAAGACT GCGAATGAAT TTCCCTGTTT 

M01 GCCAAAGCAA GTGGCTTGGA TTCTGGCCAC AAGCAAGGTT TTCATGTATC 

MSI CAGAGTTACT TCCAGTGTGT TCCCTGAAGG CAAAGAATCC CCAGGATAAG 

501 ATCGTCTTCA CCAAGGCTGA GGACAATTTG TTAGCTTTAG GACTGAAGCA 

45 551 TTTTGAAGGA ACTGAGTTTC CTAATCCTCT AATCAGCAAG TACCTTCTAA 

bOl CCTGCAAAAC TGCCCACCAA CTGACAGTGA GAATCAAGAA CCTCAACATG 

bSl AACAGAGCTC CTGACAACAT CATTAAATTT TATAAGAAGA CCAAACAGCT 

7D1 GCCA6TCCTA GGAAAATGCT GTGAAGAGAT CCAGCCACAT CAGTGGAAGC 

751 CACCTATAGA GAGAGAAGAA CACCGGCTCC CATTCTGGTT AAAGGCCAGT 

50 801 CTGCCATCCA TCCAGGAAGA ACTGCGGCAC ATGGCTGATG GTGCTAGAGA 

SSI GGTAGGAAAT ATGACTGGAA CCACTGAGAT CAACTCAGAT CGAAGCCTAG 

1D1 AAAAAGACAA TTTGGAGTTG GGGAGTGAAT CTCGGTACCC ACTGCTATTG 

151 CCTAAGGGTG TAGTCCTGAA ACTGAAGCCA GTTGCCACCC GTTTCCCCAG 

1001 GAAGGCTTGG AGACAGAAGC GTTCATCAGT CCTGAAGCCC CTCCTTATCC 

55 1051 AACCCAGCCC CTCTCTCCAG CCCAGCTTCA ACCCTGGGAA AACACCAGCC 

1101 CGATCAACTC ATTCAGAAGC CCCTCCGAGC AAAATGGTGC TCCGGATTCC 

1151 TCACCCAATA CAGCCAGCCA CTGTTTTACA GACAGTTCCA GGTGTCCCTC 

1201 CACTGGGGGT CAGTG6AGGT GAGAGTTTTG AGTCTCCTGC AGCACTGCCT 
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1251 GCTGTGCCCC CTGAGGCCAG GACAAGCTTC CCTCTGTCTG AGTCCCAGAC 
13D1 TTTGCTCTCT TCTGCCCCTG TGCCCAAGGT AATGCTGCCC TCCCTTGCCC 
1351 CTTCTAAGTT TCGAAAGCCA TATGTGAGAC GGAGACCCTC AAAGAGAAGA 
1401 GGAGTCAAGG CCTCTCCCTG TATGAAACCT GCCCCTGTTA TCCACCACCC 
mSl TGCATCTGTT ATCTTCACTG TTCCTGCTAC CACTGTGAAG ATTGTGAGCC 
1501 TTGGCGGTGG CTGTAACATG ATCCAGCCTG TCAATGCGGC TGTGGCCCAG 
1551 AGTCCCCAGA CTATTCCCAT CACTACCCTC TTGGTTAACC CTACTTCCTT 
IbOl CCCCTGTCCA TTGAACCAGT CCCTTGTGGC CTCCTCTGTC TCACCCTTAA 
1L51 TTGTTTCTGG CAATTCTGTG AATCTTCCTA TACCATCCAC CCCTGAAGAT 
1701 AAGGCCCACG TGAATGTGGA CATTGCTTGT GCTGTGGCTG ATGGGGAAAA 
1751 TGCCTTTCAG GGCCTAGAAC CCAAATTAGA GCCCCAGGAA CTATCTCCTC 
lflDl TCT.CTGCTAC TGTTTTCCCG AAAGTGGAAC ATAGCCCAGG GCCTCCACTA 
1A51 6CAGATGCAG AGTGCCAAGA AGGATTGTCA GAGAATAGTG CCTGTCGCTG 
1101 GACCGTTGTG AAAACAGAGG AGGGGAGGCA AGCTCTGGAG CCGCTCCCTC 
n51 AGGGCATCCA GGAGTCTCTA AACAACCCTA CCCCTGGGGA TTTAGAGGAA 
2001 ATTGTCAAGA TGGAACCTGA AGAAGCTAGA GAGGAAATCA GTGGATCCCC 
2051 TGAGCGTGAT ATTTGTGATG ACATCAAAGT GGAACATGCT GTGGAATTGG 
2101 ACACTGGTGC CCCAAGCGAG GAGTTGAGCA GTGCTGGAGA AGTAACGAAA 
2151 CAGACAGTCT TACAGAAGGA AGAGGAGAGG AGTCAGCCAA CTAAAACCCC 
2201 TTCATCTTCT CAAGAGCCCC CTGATGAAGG AACCTCAGGG ACAGATGTGA 
2251 ACAAAGGATC ATCAAAGAAT GCTTTGTCCT CAATGGATCC TGAAGTGAGG 
2301 CTTAGTAGCC CCCCAGGGAA GCCAGAAGAT TCATCCAGTG TTGATGGTCA 
2351 GTCAGTGGGG ACTCCAGTTG GGCCAGAAAC TGGAGGAGAG AAGAATGGGC 
2M01 CAGAAGAAGA GGAAGAAGAG GACTTTGATG ACCTCACCCA AGATGAGGAA 
2M51 GATGAAATGT CATCAGCTTC TGAGGAATCT GTGCTTTCTG TCCCAGAACT 
2501 CCAGGTGAGA GCTGGAGAAT ATTCTCAAGT ATTTCGTGGA CTCAGTAATA 
2551 TGTATCACTT ATTGATATGC CACCTGCTTG CTTGCTGCAC TATGGATAGT 
2b01 CCTAAAATCA TTTGTATTTG ATTTGTGAAT GCATTATGGG ACATGATTGT 
2b51 GGAGTTGAGG TGAAATGAGA TGGAAAGGAT GAAATTTTAC TTATTATATT 
2701 AAACTCGTTT ACACATTAAA AAAAAAAAAA AAAAAAAAAA AAAAAAAAGA 
2751 AAAAAAAAAA AAAAAAAA 



BLAST Results 



Entry RNNOTCHX from database EMBL : 
Rat notch 2 mRNA - 
Score = fllfli P = l.be-2bi identities = 21b/277 



Medline entries 



No Medline entry 



Peptide information for frame 3 



ORF from 11M bp to 2bl8 bp^ peptide length: fi35 
Category: putative protein 

Classification: Differentiation/Development 

1 M(3(2HV<3LLTt2 IHLLATCNPN LNPEATTTRI FLKELGTFAC SSIALHH<2YN 
51 PKF(JTLF(2PC NLMGAMtfLIE DFSTHVSIDC SPHKTVKKTA NEFPCLPKC3V 
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101 AWILATSKVF MYPELLPVCS LKAKNPflDKI VFTKAEDNLL ALGLKHFEGT 

151 EFPNPLISKY LLTCKTAHG2L TVRIKNLNMN RAPDNIIKFY KKTKflLPVLG 

EDI KCCEEIfiPHfl UKPPIEREEH RLPFULKASL PSIlJEELRHN ADGAREVGNM 

S51 TGTTEINSDR SLEKDNLELG SESRYPLLLP KGVVLKLKPV ATRFPRKAUR 

5 301 tJKRSSVLKPL LlfiPSPSLcJP SFNPGKTPAR STHSEAPPSK PIVLRIPHPIfl 

351 PATVLC3TVPG VPPLGVSGGE SFESPAALPA VPPEARTSFP LSESflTLLSS 

401 APVPKVMLPS LAPSKFRKPY VRRRPSKRRG VKASPCI1KPA PVIHHPASVI 

M51 FTVPATTVKI VSLGGGCNMI <2PVNAAVA<2S PflTIPITTLL VNPTSFPCPL 

5D1 NwSLVASSVS PLIVSGNSVN LPIPSTPEDK AHVNVDIACA VADGENAFfiG 

10 551 LEPKLEPflEL SPLSATVFPK VEHSPGPPLA DAECflEGLSE NSACRWTVVK 

fc>01 TEEGRflALEP LPl2GI<2ESLN NPTPGDLEEI VKI1EPEEARE EISGSPERDI 

b51 CDDIKVEHAV ELDTGAPSEE LSSAGEVTKfi TVLdKEEERS fiPTKTPSSSfl 

701 EPPDEGTSGT DVNKGSSKNA LSSNDPEVRL SSPPGKPEDS SSVDGfiSVGT 

751 PVGPETGGEK NGPEEEEEED FDDLTflDEED EMSSASEESV LSVPELflVRA 

15 SOI GEYSflVFRGL SNMYHLLICH LLACCTMDSP KIICI 



20 



BLASTP hits 
No BLASTP hits available 

Alert BLASTP hits for DKFZphamy2_li24 frame 3 
25 No Alert BLASTP hits found 

Pedant information for DKFZphamyE_li21 -, frame 3 

30 Report for »KFZphamyB_liSH -3 



ELENGTH3 S72 
EMIiO T53bb.2T 
35 EpIJ 5- fl7 

CH0I10L3 PIR:S4fli»7fl glucan ItM-alpha-glucosidase (EC 

3-5- 1-3) - yeast (Saccharomyces cerevisiae) 5e-0b 
EFUNCAT]! 3D. 01 organization of cell wall IS- cerevisiaei 

YIROncJ 2e-D7 

40 EFUNCAT3 3D. 10 extracellular/secretion proteins ES- cerevisiaei 

YIROlTcll 2e-07 

EFUNCATJ 01.05-01 carbohydrate utilization ES. cerevisiaei 

YIROncJ 2e-07 

EFUNCAT3 02-10 tricarboxy lic-acid pathway ES- cerevisiaei 

45 YDRmflcJ 5e-04 

EFUNCATJ 3D. lb mitochondrial organization ES- cerevisiaei 

YDRmflcI 5e-04 

lEKIiin] Alpha_Beta 

EKliO LOlil_COI1PLEXITY X 



SEC KSSAK(JL(2EVEKVKP(JSEKVHtJTLILDPAt2RKRL(2(3flM(2(2HVi3LLTi2IHLLATCNPNLNP 

SEG xxxxxxxxxxxxxx 

PRD ccchhhhhhhhhhhccccchhhhhcccchhhhhhhhhhhhhhhhhhhhhhhhcccccccc 

55 

SEfl EATTTRIFLKELGTFAflSSIALHHflYNPKFflTLFtfPCNLNGAridLIEDFSTHVSIDCSPH 

SEG 

PRD cchhhhhhhhhhhhhhhhhhhhhcccccceeeeecccchhhhhhhhhhceeeeeeccccc 
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SEfl KTVKKTANEFPCLPKflVAWILATSKVFMYPELLPVCSLKAKNPflDKIVFTKAEDNLLALG 

SEG 

PRD eeeeeccccccccchhhhhhhccceeeecccccccccccccccceeeeeeeccchhhhhh 

5 

SEfl LKHFEGTEFPNPLISKYLLTCKTAHflLTVRIKNLNNNRAPDNIIKFYKKTKflLPVLGKCC 

SEG 

PRD hheeecccccccceeeeeeeehhhhhhhhheeeccccccccceeeeeeccccccccceee 

10 SEfl EEIflPHflWKPPIEREEHRLPFWLKASLPSIflEELRHIIADGAREVGNr'ITGTTEINSDRSLE 

SEG 

PRD eeecccccccccchhhhhccQeeecchhhhhhhhhhhhhhhhhhhcccccccccccceee 

SEC KDNLELGSESRYPLLLPKGVVLKLKPVATRFPRKAWRflKRSSVLKPLLIflPSPSLflPSFN 

15 SEG xxxxxxxxxxxxxxx xxxxxxxxxxxxxxx • • 

PRD ecccccccccccccccccceeeeeeeeeeeccchhhhhhccccccccccccccccccccc 

SEfl PGKTPARSTHSEAPPSKNVLRIPHPIflPATVLflTVPGVPPLGVSGGESFESPAALPAVPP 

SEG xxxxxxxxxxxx 

20 PRD cccccccccccccccccceeecccccceeeeeeccccccccccccccccccccccccccc 

SEfl EARTSFPLSESflTLLSSAPVPKVMLPSLAPSKFRKPYVRRRPSKRRGVKASPCIIKPAPVI 

SEG 

PRD cccccccccccccccccccccccccccccccccccccccccccccccccccccccccccc 

25 

SEfl HHPASVIFTVPATTVKIVSLGGGCNMIflPVNAAVAflSPflTIPITTLLVNPTSFPCPLNflS 

SEG 

PRD ccccceeecccccceeeeeccccccccccccccccccccccccceeeccccccccccccc 

30 SEfl LVASSVSPLIVSGNSVNLPIPSTPEDKAHVNVDIACAVADGENAFflGLEPKLEPflELSPL 

SEG 

PRD ccccccccccccccccccccccccccccccccccceeecccccccccccccccccccccc 

SEfl SATVFPKVEHSPGPPLADAECflEGLSENSACRlilTVVKTEEGRflALEPLPflGIflESLNNPT 

35 SEG 

PRD ccccccccccccccccccccccccccccccceeeeeecccccccccccccceeeeccccc 

SEfl PGDLEEIVKMEPEEAREEISGSPERDICDDIKVEHAVELDTGAPSEELSSAGEVTKflTVL 

SEG 

40 PRD ccccccccccccccceeeccccccccccccccccccccccccccccccccccccccchhh 

SEfl AKEEERSflPTKTPSSSflEPPDEGTSGTDVNKGSSKNALSSIIDPEVRLSSPPGKPEDSSSV 

SEG 

PRD hhhhhhcccccccccccccccccccccccccccccccccccccccccccccccccccccc 

45 

SEfl DGflSVGTPVGPETGGEKNGPEEEEEEDFDDLTflDEEDEMSSASEESVLSVPELfiVRAGEY 

SEG xxxxxxxxxxxxxxxxxxxxxxxxxx 

PRD cccccccccccccccccccchhhhhhhcccchhhhhhhhhhcccccccccceeeeecccc 

50 SEfl SflVFRGLSNMYHLLICHLLACCTflDSPKIICI 

SEG 

PRD eeeeeehhhhhhhhhhhhhhhhcccccccccc 

55 (No Prosite data available for DKFZphamyE_li24 -3) 

(No Pfam data available for DKFZphamyS_liEM • 3) 
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DKFZphamy2_ljn 
5 



group: differentiation/development 

10 DKFZphamyE_l jll encodes a novel 1SD amino acid protein with high 
similarity to the allograft inflammatory factoi — 1 of Cyprinus 
carpio- 

Allograft inflammatory factor-1 (AIF-n is a protein involved in 
15 allograft rejection- In experimental autoimmune encephalomyelitis 
(EAE)i neuritis(EAN) and uveitis (EAU) it is produced by 
macrophages and microglia cells. 

The new protein can find clinical application in the development 
20 of tools to enhance the compatibility of transplanted tissues as 
well as in expression profiling of autoimmune diseases and 
infections. 



25 strong similarity to allograft inflammatory factor-1 (Cyprinus 
carpio) 

identical to DKFZphamy2_lnl 

30 Sequenced by MediGenomix 

Locus: /map="50i| .«} cR from top of ChrT linkage group" 

Insert length: 33fil bp 
35 Poly A stretch at pos- 33b2i polyadenylation signal at pos- 33m 



1 GCCGGAGCCC GGACCAGGCG CCTGTGCCTC CTCCTCGTCC CTCGCCGCGT 

51 CCGCGAAGCC TGGAGCCGGC GGGAGCCCCG CGCTCGCCAT GTCGGGCGAG 

40 101 CTCAGCAACA GGTTCCAAGG AGGGAAGGCG TTCGGCTTGC TCAAAGCCCG 

151 GCAGGAGAGG AGGCTGGCCG AGATCAACCG GGAGTTTCTG TGTGACCAGA 

E01 AGTACAGTGA TGAAGAGAAC CTTCCAGAAA AGCTCACAGC CTTCAAAGAG 

ESI AAGTACATGG AGTTTGACCT GAACAATGAA GGCGAGATTG ACCTGATGTC 

301 TTTAAAGAGG ATGATGGAGA AGCTTGGTGT CCCCAAGACC CACCTGGAGA 

45 351 TGAAGAAGAT GATCTCAGAG GTGACAGGAG GGGTCAGTGA CACTATATCC 

i*01 TACCGAGACT TTGTGAACAT GATGCTGGGG AAACGGTCGG CTGTCCTCAA 

MSI GTTAGTCATG ATGTTTGAAG GAAAAGCCAA CGAGAGCAGC CCCAAGCCAG 

501 TTGGCCCCCC TCCAGAGAGA GACATTGCTA GCCTGCCCTG AGGACCCCGC 

SSI CTGGACTCCC CAGCCTTCCC ACCCCATACC TCCCTCCCGA TCTTGCTGCC 

50 bOl CTTCTTGACA CACTGTGATC TCTCTCTCTC TCATTTGTTT GGTCATTGAG 

bSl GGTTTGTTTG TGTTTTCATC AATGTCTTTG TAAAGCACAA ATTATCTGCC 

701 TTAAAGGGGC TCTGGGTCGG GGAATCCTGA GCCTTGGGTC CCCTCCCTCT 

7S1 CTTCTTCCCT CCTTCCCCGC TCCCTGTGCA GAAGGGCTGA TATCAAACCA 

501 AAAACTAGAG GGGGCAGGGC CAGGGCAGGG AGGCTTCCAG CCTGTGTTCC 

55 flSl CCTCACTTGG AGGAACCAGC ACTCTCCATC CTTTCAGAAA GTCTCCAA6C 

101 CAAGTTCAGG CTCACTGACC TGGCTCTGAC GAGGACCCCA GGCCACTCTG 

151 AGAAGACCTT GGAGTAGGGA CAAGGCTGCA GGGCCTCTTT CGGGTTTCCT 

1001 TGGACAGTGC CATGGTTCCA GTGCTCTGGT GTCACCCAGG ACACAGCCAC 
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1051 TCGGGGCCCC GCTGCCCCAG CTGATCCCCA CTCATTCCAC ACCTCTTCTC 

1101 ATCCTCAGTG ATGTGAAGGT GGGAAGGAAA GGAGCTTGGC ATTGGGAGCC 

1151 CTTCAAGAAG GTACCAGAAG GAACCCTCCA GTCCTGCTCT CTGGCCACAC 

1201 CTGTGCAGGC AGCTGAGAGG CAGCGTGCAG CCCTACTGTC CCTTACTGGG 

5 1551 GCAGCAGAGG GCTTCGGAGG CAGAAGTGAG GCCTGGGGTT TGGGGGGAAA 

13D1 GGTCAGCTCA GTGCTGTTCC ACCTTTTAGG GAGGATACTG AGGGGACCAG 

1351 GATGGGAGAA TGAGGAGTAA AATGCTCACG GCAAAGTCAG CAGCACTGGT 

mOl AAGCCAAGAC TGAGAAATAC AAGGTTGCTT GTCTGACCCC AATCTGCTTG 

m51 AAACCTGACT CTGCTTCTCT CATTTGTCTT CCTACCCTAC TCACATAATT 

10 1501 CACTCATTGA CTCACTCATT CACCAGATAT TTATTGACCT GCTATTATAA 

1551 GCTTTACATC CTCCCATGTT GTCCTGGCAT GTGCAGTATA CACGGTCTAA 

IbOl CTCATCTCTC CCCAGATCTC TCAGAACCTT GAGCTTGGGA ATTGAACTGG 

lb51 GGTCACCTGT GTCCTTTCTT ATGGACTCGC AGGATTTTAG AACCCTAATG 

1701 CACCCTGGAG GGTAGCTGGG CCAGACTTCT CATTTCACAG GTGAGGAGAC 

15 1751 TGGTGCCCCA CAGGGATTAA GTGCCTTGCC CAAGGTCAGG CTTATCTCCA 

1601 GA6GGAGGTG CCCTGGACTG GGGCCCAGAT GTTCAGGGAC CCTGCCTACA 

1551 CCTCATTTCC AGTGTGGGCT GCCTTAGTTA GTTATGAGAA CAGGGAAGGG 

MDl CTGGGAAGAG ACAGCCTCCA AGGTCAACAC TTGGAGAGGG TTTCACTTGC 

n51 TCTGAAGACC CTGGTCCAGG ATTCGCCCTC TCCCATGCCT TCAAGTCAGC 

20 2001 ATCAGGCTTA GGGCAAAGAC CAGGCCTCTG AAGCTGCCTC TTGTAATTCA 

2051 TGCAGGAAGA T6TCAAAGTC AGCCCCATCT TGGCTGATCA GGGTGTTCAG 

2101 CCTTAACCCC ACCTGTGTTC TGAAGTCTCT TACCCTACCT 6CTCAGGACT 

2151 GAGACAGTTA TTCACTGAAC ATATTTATTA AGCACTTGCT GTAGGCCAAC 

2201 AGTTAAGAAT CCAATAATGA AAT6GACAGA TTCATGGAAC TTAGAGTCCA 

25 2251 ATAGGAAAGT GAGACCCAGA CAATGACAAT GAGATAAATG TTAGGAAGGG 

2301 GGAGGTATGG GGTGACTTCC CTGCAGTCCT GGGGGCCTAC ATGGGCCCAA 

2351 GACTGGGTGA GAGTCTTGGC AGAGCCTTTG CAACACCTTA AGTGGACAGG 

21*01 ACTGGGAGGT CTTGGTGGTT GGAGCCAACG TGGGTTCCCT GCGGCTCCTT 

2M51 AGTCACCTCT GATAGCAGAT TGAGGGAGGA AAACAGGTAA GGCATGAGGA 

30 2501 AATGGCCAGG TTGGGTTAAC CCACTGGTTT CAACCAGTTC AGGAATGAGG 

2551 TTATTTGGCC ATGACTGGCT GATCTTGAGC TCAAGGATCT GCTTCAAATG 

2b01 CACACAGGCC TAGTTGAAGT TTAAACCCCA GCAAAACATT CCTCCCTGTA 

2b51 AATGGAAAAT CCTACTTCTA CCCCCACCCT GCCCTGTTTT TTGTTTTTTT 

2701 TTTCCCCAAG ATCATTAGAT GTCCTCACCC CTCCTCACTG CCTCTCCTCT 

35 2751 CTGGGACAGG CTGGGACCTT TGAGGAAGAT AAAGCCTTCC TTGACTACCC 

2B01 ATCATATTCA GTGTCCCTGT TCCTCACTCA GAGAGGAAGG CAGAACCAGT 

2A51 CAGGCTTATT TCAGTAAGTT CCACAGTTCT ACAAGACTGC AGGAATTCTC 

2101 CTTAAGGGAG GAGAGCAAGC AGGTGTGGCC CCAGCTTCTG GAAATGGCAG 

2^51 AAGAGAGGGT TTTCTCATTG AATGGGGGTG GGGGCTCGTG TGTCCTGGGA 

40 3001 AACCCCATCA GTCCCTTCAT TTCTTGAGAC TCAACTCCTG GGAGGAGAGG 

3051 GTCTCAAGAG TTGTCCCTGG AAGGAGGGCG GGGGCAGTCT GCATCTATTT 

3101 CAGGTTGTGG CTCTTGGTTC TAGGACTCTT ACTTCTCTGG CTAAGGGCTC 

3151 AGCTTCTTGG GACTTCAACC ATCTTCTTTC TGAAAGACCA AATCTAATGT 

3201 AACCAGTAAC GTGAGGACTG CCAAGTATGG CTTTGTCCCT ATGACTCAGA 

45 3251 GGAGGGTTTG TCGGGCAAAT TCAGGTGGAT GAAGTATGTG TGTGCGTGTG 

3301 CATGGGAGTG TGCGTGGACT GGGATATCAT CTCTACAGCC TGCAAATAAA 

3351 CCAGACAAAC TTAAAAAAAA AAAAAAAAAA A 



50 BLAST Results 



Entry AB01230 t l_l from database TREMBL : 

product: "allograft inflammatory factor-l"; Cyprinus carpio mRNA 
55 for 

allograft inflammatory factoi — li complete cds- 

Score = 575-. P = 3.7e-Si4, identities = 113/lMt-, positives = 

12fi/mbi 
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10 



30 



35 



Medline entries 

No Medline entry 

Peptide information for frame 2 



15 ORF from AT bp to 53A bp; peptide length: ISO 
Category: strong similarity to known protein 
Classification : unclassified 

1 I1SGELSNRF<2 GGKAFGLLKA RfiERRLAEIN REFLCDC2KYS DEENLPEKLT 
20. 51 AFKEKYI1EFD . LNNEGEIDLM SLKRMflEKLG VPKTHLEflKK MISEVTGGVS 
1D1 DTISYRDFVN IWLGKRSAVL KLVM1FEGKA NESSPKPVGP PPERDIASLP 



25 BLASTP hits 

No BLASTP hits available 

Alert BLASTP hits for DKFZphamy2_l jM ■. frame S 
No Alert BLASTP hits found 

Pedant information for DKFZphamy2_l jlT-. frame 2 



Report for DKFZphamy2_l jll . 2 



[LENGTH! ISO 
40 EMIO 170b7.flb 
CpO b-fc>3 

EHOIIOLJ TREMBL: AB012301_1 product: "allograft inflammatory 

factor-l"i Cyprinus carpio mRNA for allograft inflammatory 
factoi — li complete cds- 2e-S1 
45 IFUNCAT]! 30-04 organization of cytoskeleton ES- cerevisiae-. 

YBR101c3 Se-04 

EFUNCAT3 03.07 pheromone response-, mating-type determination! 
sex-specific proteins IS- cerevisiae-. YBRlOlcJ Se-OM 
CFUNCATJ Ofl.n cellular import IS. cerevisiae-. YBRlO'lcJ Se-04 
50 EFUNCATJ 10.02. n other morphogenetic activities CS. 
cerevisiae-. YBRlO^ciJ 5e-04 

EFUNCATJ 03.22 cell cycle control and mitosis ES. cerevisiae-. 
YBRlO^cJ Se-D4 

EFUNCATJ 03-04 budding-, cell polarity and filament formation 
55 ICS - cerevisiae-. YBRlOlcJ Se-OM 

CFUNCAT3 03-01 cell growth CS. cerevisiae-. YBRlO^cJ Se-04 
EFUNCATJ 30-05 organization of centrosome ES- cerevisiae-. 
YBRlO^cJ Se-04 
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ESCOPID d2mysb_ 1 . 37 . 1 ■ S • 15 Myosin Essential Chain Myosin 

Regulatory Chai Se-EO 

ESC0P3 dlwdcb_ 1.37.1.5-14 Myosin Essential Chain Myosin 

Regulatory Chai 3e-05 

5 ISCOPJ dlosa 1.37.1.5-13 Calmodulin I (Paramecium 

tetraurelia) 3e-lb 

CSCOPJ dlauib_ 1*37.1. 5*1^ Calcineurin regulatory subunit 

(B-chain 2e-lb • 

EPIRKIO duplication 7e-0b 

10 EPIRKUJ mitosis ?e-0b 

EPIRKIO calcium binding 7e-Db 

EPIRKliD EF hand 7e-Dt 

EPIRKLD cell division 7e-Db 

ESUPFAMID unassigned calmodulin-related proteins 3e-47 
15 ESUPFAMID calmodulin 7e-Db 

ESUPFAMID calmodulin repeat homology 3e-17 
EKU3 All_Alpha 
HZ K Ul ZD 3D 



20 

SEfl MSGELSNRFflGGKAFGLLKARflERRLAEINREFLCDflKYSDEENLPEKLTAFKEKYMEFl) 
Ictr- 

HHHHHHHHHHHHHT 

25 SEfl LNNEGEIDLMSLKRMMEKLGvWTHLEMKKMISEVTGGVSDTISYRDFVNMMLGKRSAVL 
Ictr- 

TTTTTCBCHHHHHHHHHHTTTCCCHHHHHHHHHHCTTTTCCCBCHHHHHHHHCCTTTHHH 

SEfl KLVMMFEGKANESSPKPVGPPPERDIASLP 
30 Ictr- HHHHHHTTTTC 



(No Prosite data available for DKFZphamy2_ljn.2) 
35 (No Pfam data available for DKFZphamyE_l jll-B) 
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5 group: cell cyle 

DKFZphamyE_E4bM encodes a novel tTfl amino acid protein with 
similarity to human STIM1. 

10 The stromal interaction molecular 1 gene (STIM1) encodes a type I 
trans-membrane protein of unknown function-, which induces growth 
arrest and degeneration of the human tumor cell lines G401 and RD 
but not HBL100 and CaLu-tn suggesting a role in the pathogenesis 
of rhabdomyosarcomas and rhabdoid tumors- There is also strong 

15 similarity to a Mus musculus stromal cell proteini which 

selectively increases interleukin 7-dependent proliferation of 
pre-B cells. The novel protein contains 1 transmembrane domain. 

The new protein can find application in modulation of tumour 
20 growth. 



similarity to STIM1 (Homo sapiens) 

25 probably differential polyadenylation : cf. EST-BLAST file- 
perhaps complete cds. 

Pedant: SIGNAL_PEPTI]>E and TRANSMEMBRANE 1 
Sequenced by GBF 

30 

Locus: /map="131.2 cR from top of ChrM linkage group" 
Insert length: 3305 bp 

Poly A stretch at pos- 3E74-. polyadenylation signal at pos- 3EbO 

35 ' ( 

1 GGCGCCTTCA TCCCGCCTCG ACTCCTGGCC CAGCGTGGGG CTGGCTGCTG 

51 CGGCGGCGGC GCTGGGCTGC GTTGCTGGTG CTCGGGCTGC TGGTACCCGG 

101 AGCGGCGGAC GGATGCGAGC TTGTGCCCCG GCACCTCCGC GGGCGGQGGG 

40 151 CGACTGGCTC TGCCGCAACT GCCGCCTCCT CTCCCGCCGC GGCGGCCGGC 

E01 GATAGCCCGG CGCTCATGAC AGATCCCTGC ATGTCACTGA GTCCACCATG 

251 CTTTACAGAA GAAGACAGAT TTAGTCTGGA AGCTCTTCAA ACAATACATA 

301 AACAAATGGA TGATGACAAA GATGGTGGAA TTGAAGTAGA GGAAAGTGAT 

351 GAATTCATCA GAGAAGATAT GAAATATAAA GATGCTACTA ATAAACACAG 

45 401 CCATCTGCAC AGAGAAGATA AACATATAAC GATTGAGGAT TTATGGAAAC 

451 GATGGAAAAC ATCAGAAGTT CATAATTGGA CCCTTGAAGA CACTCTTCAG 

501 TGGTTGATAG AGTTTGTTGA ACTACCCCAA TATGAGAAGA ATTTTAGAGA 

551 CAACAATGTC AAAGGAACGA CACTTCCCAG GATAGCAGTG CACGAACCTT 

bOl CATTTATGAT CTCCCAGTTG AAAATCAGTG ACCGGAGTCA CAGACAAAAA 

50 bSl CTTCAGCTCA AGGCATTGGA TGTGGTTTTG TTTGGACCTC TAACACGCCC 

701 ACCTCATAAC TGGATGAAAG ATTTTATCCT CACAGTTTCT ATAGTAATTG 

751 GTGTTGGAGG CTGCTGGTTT GCTTATACGC AGAATAAGAC ATCAAAAGAA 

flOl CATGTTGCAA AAATGATGAA AGATTTAGAG AGCTTACAAA CTGCAGAGCA 

S51 AAGTCTAATG GACTTACAAG AGAGGCTTGA AAAGGCACAG GAAGAAAACA 

55 =101 GAAATGTTGC TGTAGAAAAG CAAAATTTAG AGCGCAAAAT GATGGATGAA 

151 ATCAATTATG CAAAGGAGGA GGCTTGTCGG CTGAGAGAGC TAAGGGAGGG 

1001 AGCTGAATGT GAATTGAGTA GACGTCAGTA TGCAGAACAG GAATTGGAAC 

1051 AGGTTCGCAT GGCTCTGAAA AAGGCCGAAA AAGAATTTGA ACTGAGAAGC 
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1101 AGTTGGTCTG TTCCAGATGC ACTTCAGAAA TGGCTTCAGT TAACACATGA 
1151 AGTAGAAGTG CAATACTACA ATATTAAAAG ACAAAACGCT GAAATGCAGC 
1201 TAGCTATTGC TAAAGATGAG GCAGAAAAAA TTAAAAAGAA GAGAAGCACA 
1251 GTCTTTGGGA CTCTGCACGT TGCACACAGC TCCTCCCTAG ATGAGGTAGA 
5 13D1 CCACAAAATT CTGGAAGCAA AGAAAGCTCT CTCTGAGTTG ACAACTTGTT 
1351 TACGAGAACG ACTTTTTCGC TGGCAACAAA TTGAGAAGAT CTGTGGCTTT 
mOl CAGATAGCCC ATAACTCAGG ACTCCCCAGC CTGACCTCTT CCCTTTATTC 
m51 TGATCACAGC TGGGTGGTGA TGCCCAGAGT CTCCATTCCA CCCTATCCAA 
1501 TTGCTGGAGG AGTTGATGAC TTAGATGAAG ACACACCCCC AATAGTGTCA 

10 1551 CAATTTCCCG GGACCATGGC TAAACCTCCT GGATCATTAG CCAGAAGCAG 
lfc>01 CAGCCTGTGC CGTTCACGCC GCAGCATTGT GCCGTCCTCG CCTCAGCCTC 
IbSl AGCGAGCTCA GCTTGCTCCA CACGCCCCCC ACCCGTCACA CCCTCGGCAC 
1701 CCTCACCACC CGCAACACAC ACCACACTCC TTGCCTTCCC CTGATCCAGA 
1751 TATCCTCTCA GTGTCAAGTT GCCCTGCGCT TTATCGAAAT GAAGAGGAGG 

15 IfiOl AAGAGGCCAT TTACTTCTCT GCTGAAAAGC AATGGGAAGT GCCAGACACA 
1851 GCTTCAGAAT GTGACTCCTT AAATTCTTCC ATTGGAAGGA AACAGTCTCC 
1=101 TCCTTTAAGC CTCGAGATAT ACCAAACATT ATCTCCGCGA AAGATATCAA 
n51 GAGATGAGGT GTCCCTAGAG GATTCCTCCC GAGGGGATTC GCCTGTAACT 
S001 GTGGATGTGT CTTGGGGTTC TCCCGACTGT GTAGGTCTGA CAGAAACTAA 

20 2051 GAGTATGATC TTCAGTCCTG CAAGCAAAGT GTACAATGGC ATTTTGGAGA 
2101 AATCCTGTAG CATGAACCAG CTTTCCAGTG GCATCCCGGT GCCTAAACCT 
2151 CGCCACACAT CATGTTCCTC AGCTGGCAAC GACAGTAAAC CAGTTCAGGA 
2201 AGCCCCAAGT GTTGCCAGAA TAAGCAGCAT CCCACATGAC CTTTGTCATA 
2251 ATGGAGAGAA AAGCAAAAAG CCATCAAAAA TCAAAAGCCT TTTTAAGAAG 

25 2301 AAATCTAAGT GAACTGGCTG ACTTGATGGA ATCATGTTCA AGTGGCATCT 
2351 GTAAACTATT ATCCCCCACC CTCCACTCCC CACCTTTTTT TTGGTTTAAT 
2401 TTTAGGAATG TAACTCCATT GGGGCTTTCC AGGCCGGATG CCATAGTGGA 
2451 ACATCCAGAA GGGCAACTGT CTACTGTCTG CTTATTTAAG TGACTATATA 
2501 TAATCAATTC ATCAAGCCAG TTATTACTGA AAAATCATTG AAATGAGACA 

30 2551 GTTTACAGTC ATTTCTGCCT ATTTATTTCT GCTTTGTTCT CAGTGATGTA 
2b01 TATGCAACAT TTTGTTGAAA GCCACGATGG ACTTACAAGC TTTAATGGAC 
2b51 TCGTAAGCCA GCATGGGCTT GCAAAAATTT CTTGTTTACC AGAGCATCTT 
2701 CTTATCTTTC CACAGAGCTA TTTACATCCT GGACTATATA ACTTAAAAGA 
2751 AGTAAAACGT AATTGCACTA CTGTTTTCCA GACTGGAAAA AAAAAAAAAT 

35 2S01 CTCTGCAAGT GAAACTGTAT AGAGTTTATA AAATGACTAT GGATAGGGGA 
2551 CTGTTTTCAC TTTTAGATCA AAATGGGTTT TTAAGTAGAA CCTAGGGTTT 
2*501 CTAATTGACT TGATTTCTGG AAATGAAAAC CCGCGCTTTT ATTATGGGAA 
2151 GCTTCTTGAA CTGCATTTAC TATTGTGAAG TTTCAAGTCC CGCTGTAAAG 
3001 ATCATGTTGT TTTGTTTTCC CCAGGGCTTT CACTGTGATT TACTGCATTG 

40 3051 CAGGCTGTAT GATAAAACAC ACATAATTTA AAGAGAGAAG GCTCTTGATT 
3101 CCTTATGCAA GTGGAAGAGT TGAAACTTGA TTGAAGGACT TAAAACATTC 
3151 ACAACCTTAA GCCGAGGTGG GGGGATATGG GGATTCAGGC AGTTGTTTAC 
3201 ACACTTTGAA TAACTGCAAA GGATTTACGG TTTGTGAAAA ATGTGTACTG 
3251 TGGAAAAGAT AATAAATTGA AGACATTAAA AAAAAGAAAA AAAAAAAAAA 

45 3301 AAAAA 



BLAST Results 



50 

Entry HS5242fc.lO_l from database TREHBL : 

gene: n STIMl n n product: "GOK n \ Homo sapiens GOK (STIfll) mRNA-i 

complete 

cds- 

55 Score = 1317, P = 4-2e-142-, identities = 275/447-. positives = 
33b/447-, 
frame +3 
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WO 01/98454 PCT/IB01/02050 
Entry MMUM7323_1 from database TREMBL: 

product: "stromal cell protein"i Mus musculus stromal cell 
protein 

mRNAi complete cds- 
5 Score = 13^-. P ■ ft.fie-ms-. identities = 27H/MH7-. positives = 
33b/447-i 
frame +3 

Entry ^17341 from database EI1BL : 
10 human STS ESTlb747T- 

Score = 13^Qi P = T-le-57-. identities = 264/267 



15 Medline entries 



^7D7^2: 

Parker NJi Begley CGi Smith PJ-i Fox RM-i Molecular cloning of a 
20 novel 

human gene ( DllSMfllbE ) at 

chromosomal region llplS.S- Genomics mb Oct 15i37(2) :253-b 
1b32bbfl0: 

25 Oritani Kt Kincade PU-i Identification of stromal cell products 
that 

interact 

with pre-B cells- J Cell Biol lllb Aug^l34 (3) :771-fi2 

30 



Peptide information for frame 3 



35 

ORF from Hit bp to 230=1 bpi peptide length: b^fi 
Category: similarity to known protein 
Classification: Cell signaling/communication 
Prosite motifs: RGD (SSI-S^l) 

40 

1 MTDPCMSLSP PCFTEEDRFS LEALflTIHKfl 
51 DMKYKDATNK HSHLHREDKH ITIEDLIilKRW 
101 VELPflYEKNF RDNNVKGTTL PRIAVHEPSF 

45 151 LDVVLFGPLT RPPHNUMKDF ILTVSIVIGV 
201 MKDLESLflTA EUSLMDLflER LEKAUEENRN 
251 EEACRLRELR EGAECELSRR <2YAE(3ELE(3V 
301 BALflKULflLT HEVEVC3YYNI KRflNAEMtJLA 
351 HVAHSSSLDE VDHKILEAKK ALSELTTCLR 

50 401 SGLPSLTSSL YSDHSUVVMP RVSIPPYPIA 
451 MAKPPGSLAR SSSLCRSRRS IVPSSPflPflR 
501 HTPHSLPSPD PDILSVSSCP ALYRNEEEEE 
551 SLNSSIGRKfl SPPLSLEIYfl TLSPRKISRD 
bOl GSPDCVGLTE TKSMIFSPAS KVYNGILEKS 

55 b51 SSAGNDSKPV flEAPSVARIS SIPHDLCHNG 



MDDDKDGGIE VEESDEFIRE 
KTSEVHNUTL EDTLflULIEF 
MIScJLKISDR SHR(2KL<2LKA 
GGCIilFAYTUN KTSKEHVAKM 
VAVEK(2NLER KMMDEINYAK 
RMALKKAEKE FELRSSWSVP 
IAKDEAEKIK KKRSTVFGTL 
ERLFRU(2(aiE KICGFfllAHN 
GGVDDLDEDT PPIVSflFPGT 
AflLAPHAPHP SHPRHPHHPfl 
AIYFSAEKflU EVPDTASECD 
EVSLEDSSRG DSPVTVDVSU 
CSMNflLSSGI PVPKPRHTSC 
EKSKKPSKIK SLFKKKSK 



-145- 



WO 01/98454 PCT/IB01/02050 

BLASTP hits 

No BLASTP hits available 
5 Alert BLASTP hits for DKFZphamy2_2i|bM •, frame 3 

No Alert BLASTP hits found 

Pedant information for DKFZphamyS SMbM ■> frame 3 
10 - -JZ ___ 

Report for DKFZphamy2_2MbM -3 



15 ELENGTH3 TbT 
EMIO 

EpIJ b.b^ 

EH0M0L]) TREMBL:HS52M2blO_l gene: n STIMl n i product: "60K"i 

Homo sapiens GOK (STIM1) mRNA-i complete cds. le-lS 1 ! 
20 EBL0CKS2 BLOOflfibC Dihydroxy-acid and b-phosphogluconate 
dehydratases proteins 
CBL0CKS3 PR00021D 
DEBLOCKS] PR010S3F 

CBLOCKSJ BL0D72bB AP endonucleases family 1 proteins 
25 IPROSITEJ RGD 1 

IKIiO SIGNAL_PEPTIDE 3a 

CKliD TRANSMEMBRANE 1 

IKIO LOLI_COMPLEXITY 15-ab V. 

CKliO C0ILED_C0IL fl-MS V. 

30 

SE(2 RLHPASTPGPAUGULLRRRRUAALLVLGLLVPGAADGCELVPRHLRGRRATGSAATAASS 
SEG xxxxxxxxxxxxxxxxxxxxxxxxxxx xxxxxxxxxxx 

PRD cccccccccccchhhhhhhhhhhhhhhhhcccccccccccchhhhhhhcccccccccccc 
35 COILS 



MEM 

SE<2 PAAAAGDSPALMTDPCMSLSPPCFTEEDRFSLEALflTIHKflMDDDKDGGIEVEESDEFIR 
40 SEG xxxxxxxxxx 

PRD ccccccccccccccccccccccccchhhhhhhhhhhhhhhhhhcccccceeeecchhhhh 
COILS 



MEM 

45 

SEU EDMKYKDATNKHSHLHREDKHITIEDLUKRWKTSEVHNWTLEDTLdliJLIEFVELPflYEKN 

SEG 

PRD hhcccccccccccccccccceeeehhhhhhhhhhccccchhhhhhhhhhhhhhcccchhh 
COILS 

50 

MEM 

SE(2 FRDNNVKGTTLPRIAVHEPSFMIS(2LKISDRSHRt2KL<2LKALDVVLFGPLTRPPHNIJMKD 
SEG 

55 PRD hhhhhccccccceeeeecccceeeeeecccchhhhhhhhhhheeeecccccccccccchh 
COILS 



MEM 
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WO 01/98454 PCT/IB01/02050 

SE(3 FILTVSIVIGVGGCWFAYTflNKTSKEHVAKMMKDLESLflTAEflSLMDLflERLEKACJEENR 

SEG 

PRD hhheeeeeeccccceeeecccccchhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhcc 
5 COILS 

ccccccccccccccccccccccccccccccccc 

meh mmmmmmmmmmmnmmmmm 

se<2 nvavekfinlerkmmdeinyakeeacrlrelregaecelsrr<2yae(2elec2vri1alkkaek 

10 SEG 

PRD ceeeehhhhhhhhhhhhhhhhhhhhhhhhhhhhcchhhhhhhhhhhhhhhhhhhhhhhhh 
COILS 

CCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCC 

MEM 

15 

SE<3 EFELRSSUSVPDALl2KWL<2LTHEVEV<2YYNIKR(2NAEMi2LAIAKDEAEKIKKKRSTVFGT 

SEG xxxxxxxxxxxxx 

PRD hhhhhccccccchhhhhhhhhhhheeeeccchhhhhhhhhhhhhhhhhhhhhhhhhccce 
COILS 

20 

MEM 

SE<2 LHVAHSSSLDEVDHKILEAKKALSELTTCLRERLFRUfJfilEKICGFfllAHNSGLPSLTSS 

SEG 

25 PRD eeeeeccccchhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhcceeeeccccccceee 
COILS 



MEM 

30 SE<2 LYSDHSUVVMPRVSIPPYPIAGGVDDLDEDTPPIVSUFPGTMAKPPGSLARSSSLCRSRR 

SEG xxxxxxxxxxxxx 

PRD cccccccccccccccccccccccccccccccccccccccccccccccccccccccccccc 
COILS 



35 MEM 

SE<2 SIVPSSP<2P<3RA<2LAPHAPHPSHPRHPHHP<3HTPHSLPSPDPDILSVSSCPALYRNEEEE 

SEG x xxxxxxxxxxxx xxxxxxxxxxxxx xxxx 

PRD eeecccccccccccccccccccccccccccccccccccccccceeeeeecccchhhhhhh 
40 COILS 



MEM 

SEfl EAIYFSAEK(JlilEVPDTASECDSLNSSIGRK<2SPPLSLEIY(3TLSPRKISRDEVSLEDSSR 

45 SEG x 

PRD hhhhhhhhhhcccccccccccccccccccccccceeeeeeeecccccccccccccccccc 
COILS 

MEM 

50 

SE(2 GDSPVTVDVSUGSPDCVGLTETKSMIFSPASKVYNGILEKSCSMNflLSSGIPVPKPRHTS 

SEG 

PRD cccceeeeeccccccccceeeccccccccccceeeeeeeccccccccccccccccccccc 
COILS 

55 

MEM 

SEA CSSAGNDSKPVtfEAPSVARISSIPHDLCHNGEKSKKPSKIKSLFKKKSK 
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WO 01/98454 PCT/IB01/02050 

SEG xxxxxxxxxxxxxxxxx 

PRD ccccccccceeeecceeeeeccccccccccccccccccceeeeeecccc 

COILS 

MEM 



10 PSOOOlfe 



Prosite for DKFZphamyS_BMbM • 3 
bb0->bb3 RGD PDOCODDlfc, 



(No Pfam data available for B<FZphamy2_2MbM • 3) 
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WO 01/98454 
DKFZphamy2_E4cfl 



PCT/IB01/02050 



5 group: transmembrane protein 

3>KFZphamyS_SMcfl encodes a novel M54 amino acid protein without 
similarity to known proteins- 

10 The novel protein contains 1 transmembrane region- 
No informative BLAST results* No predictive prosite-i pfam or SCOP 
motif e- 

The new protein can find application in studying the expression 
15 profile of amygdala-specific genes and as a new marker for 
amygdala cells- 



20 putative protein 

EST of GEN-4EbHD7 is 141 Bp longer at S'-end 
perhaps complete cds- 
Pedant: TRANSMEMBRANE 1 

25 

Sequenced by GBF 

Locus: /map="tiO c 1.7 cR from top of Chr3 linkage group" 

30 Insert length: 32D0 bp 

Poly A stretch at pos- 3177-. polyadenylation signal at pos- 315b 



1 CCTGTCCACA GGGCCCGCTC CAGCAGCCAT GGCAACCACA TCCTCCAAGC 

35 SI CAGAGGGCCG CCCTCGAGGG CAGGCTGCCC CCACCATCCT GCTGACAAAG 

101 CCACCGGGGG CCACCAGCCG CCCCACCACA GCGCCCCCCC GCACTACCAC 

1S1 ACGCAGGCCC CCCAGGCCCC CAGGCTCTTC CCGAAAAGGG GCTGGTAATT 

EDI CATCACGCCC TGTCCCGCCT GCACCTGGTG GCCACTCCAG GAGTAAAGAA 

ESI GGACAGCGAG GACGAAATCC AAGCTCCACA CCTCTGGGGC AGAAGCGGCC 

40 3D1 CCTGGGGAAA ATCTTTCAGA TCTACAAGGG CAACTTCACA GGGTCTGTGG 

351 AACCGGAGCC CTCTACCCTC ACCCCCAGGA CCCCACTCTG GGGCTACTCC 

401 TCTTCACCAC AGCCCCAGAC AGTGGCTGCG ACCACAGTGC CCAGCAATAC 

451 CTCATGGGCA CCCACCACCA CCTCCCTGGG GCCTGCAAAG GACAAGCCAG 

5D1 GCCTTCGCAG AGCAGCCCAG GGGGGTGGTT CTACCTTCAC CAGCCAAGGA 

45 551 GGGACACCAG ATGCCACAGC AGCCTCAGGT GCCCCTGTCA GTCCACAAGC 

bOl TGCCCCAGTG CCTTCTCAGC GCCCCCACCA CGGTGACCCA CAGGATGGCC 

b51 CCAGCCATAG TGACTCTTGG CTTACTGTTA CCCCTGGCAC CAGCAGACCT 

701 CTGTCTACCA GCTCTGG6GT CTTCACGGCT GCCACG6GGC CCACCCCAGC 

751 TGCCTTCGAT ACCAGTGTCT CAGCCCCTTC CCAGGGGATT CCTCAGGGAG 

50 SD1 CATCCACAAC CCCACAAGCT CCAACCCATC CCTCCAGGGT CTCAGAAAGC 

651 ACTATTTCTG GAGCCAAGGA GGAGACTGTG GCCACCCTCA CCATGACCGA 

101 CCGGGTGCCC AGTCCTCTCT CCACAGTGGT ATCCACAGCC ACAGGCAATT 

151 TCCTCAACCG CCTGGTCCCC GCCGGGACCT GGAAGCCTGG GACAGCAGGG 

1D01 AACATCTCCC ATGTGGCCGA GGGGGACAAA CCGCAGCACA GAGCCACCAT 

55 1051 CTGCCTGAGC AAGATGGATA TCGCCTGGGT GATCCTGGCC ATCAGCGTGC 

1101 CCATCTCCTC CTGCTCTGTC CTGCTGACGG TGTGCTGCAT GAAGAGGAAG 

1151 AAGAAGACCG CCAACCCGGA GAACAACCTG AGCTACTGGA ACAACACCAT 

1E01 CACCATGGAC TACTTCAACA GGCATGCTGT GGAGCTGCCC AGGGAGATCC 
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WO 01/98454 



PCT/IB01/02050 



1251 AGTCCCTTGA AACCTCTGAG GACCAGCTCT CAGAGCCCCG CTCCCCAGCC 
13D1 AATGGCGACT ATAGAGACAC TGGGATGGTC CTTGTTAACC CCTTCTGTCA 
1351 AGAAACACTG TTTGTGGGAA ACGATCAAGT ATCTGAGATC TAACTACAGC 
1M01 AGGCATCACT TTGCCATTCC GTATTTTTCG TCTCTAAATT ATAAATATAC 
5 1MS1 AAATATATAT ATTATAAATA TAACCTTTGT GTAACCCTGA CTTAATGAGA 
1501 AACATTTTCA GCTTTTTTTC CTATGAATTG TCAACATCTT TTTTACAAGT 
1551 GTGGTTTAAA AAA AAAAAAA CTTTACAGAA TGATCTGTGG CTTTATAAAA 
IbOl TAAAGGTATT TCTAAGCAAA GCAGTTGCAT TGATTGCTTC TCTTAATAAC 
lb51 TATTCTTGAG CACCTGGGGA TCCCAGGAAC CCTGGTCAGG TGAGGTAAGA 

10 17D1 GACTGACCTC CTGTAGAAGC TGAATGTTAC AGTGGTCAAG CGCACGATTC 
1751 TTTGAGTGAT TCTTAAAGCT CTGGTTCCTC TTGATTTGGT GTGACCCCAT 
IflOl TTCCTCCCTT CTCATACGCA CACCTGTAAA GGGAACTGGA CCGCCTCAGG 
1651 GGAAGACGGC AGACTCATGC ACAGAGAAGG AAAAGGGAAC ATCTCATCAC 
._„*1301 CTCTGAGGAT GAGTACCCTG GAGCCTTATG ACGGCACCAT TGGATGTCAT 

15 1151 GTTTAATTCC ATCCAAGTTG TGGATGGCAG GCAGGAGCAT GGAGCCCTCA 
5001 GGAATCCATG GAGGACATCA AGGCATCCCA AGGCCATATT CCCCTAACAT 
E0S1 TACTTCCACT GCTAACAACA GGACTGCCTT TCCCTGGTGG GAAAATGCTC 
2101 CCTTTATGCC CATTCCTGTA TCCCCTCCAA CACCCACATC TGCATTAAAC 
2151 ACCCGTGCCT TTCTCTTGGA GAGGGTTTAG ATGCAGATCC CGGCCCTGGA 

20 2201 GCTTTAAAAT GCTTGCCCTT CCTTCTTCAA GGATCAAATG TTTATTGGGG 
2251 TTCAGCTTTG TTTTCTCAAA AGGCCATGGT ATCGTGCCCC TGAGGAACAT 
2301 GTTTATCTAA GAAGCTTTGA GGTAGTAGAG CGATAATTTT TGAAACCTTC 
2351 CTCCTGCAAT CTTTAAAAAA GAAAAAAAAG ATTGCCCAAA CAAATCATTT 
2M01 GGGAGAAGAC ATCATTATAC TCCTACTTGG CACTGCAAAC CTGCTC6CAG 

25 2M51 CACCAGCCGG TGGACTTGCC ATCCAGCTCT CAGCTTCCAC TGCTCCCCTT 
25D1 GTTCCCGGCC GGCTGGCTGC CTCCCCGTGC TGTGTCCAGC ACGGCCAACA 
2551 ACGTCAGACC CTCAGAGACG CCCAAGGGGC TTCCAGAGGT GGCCGCTTCT 
2b01 CTATTTTTTC CTGATTGTGG CTGAGAGAGA TGATTACTGC TTTGACACTT 
2bSl CCTTTCTCTA AAAGAAAAAT AGTTTGATAG TATATTTTGA ATATAGATGC 

30 27D1 TCTTATAGTC AGATTGGGAA TTGAACTTGA ATATTGGGTC ATATGTTTGT 
2751 GTTGTTGCTG TAGTCTATCA TGACTTTTTT CTTTCTGCAT TTTCCTTAAA 
2AD1 AAAAAAAAAA AGATGGCCTT CAAAAGTGTG TTCTCAATGT TGTATGAACC 
2fl51 TCCTTCACAT GAGTTCGGTT GTTGTCTCTC TTCAAAGACT CTTCAACCCA 
2101 CAAAGAAGCA ACTAAATGTT TCTCTAAGTT TAATTTTCTA GCGTGTTGTT 

35 2151 GTCTTACCTT TTTAACCTTA CCATAATATT TCTGTTAACT GTTACATTTA 
3001 ATATACCAAT GTGTGTAAGT ATACAGAGAA AAATCTGTTT GTAAAGTAAA 
3D51 ATTTATATAT AATATATGTA ATCAAAGATA CATATGTTAT ATATACATAT 
3101 GTGGATGTAT GACTTATTTT TCCTTATCCA CAGATTTCAG CTACCATGTA 
3151 TATATAAATA AACTTATTTT ATTAGCCAGA GAAAAAAAAA AAAAAAAAAA 

40 



BLAST Results 



45 No BLAST result 



Medline entries 



50 

No Medline entry 



55 Peptide information for frame 2 



ORF from 2=1 bp to 13^D bp, peptide length: H5M 
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WO 01/98454 



PCT/IB01/02050 



Category: putative protein 

Classification: Transmembrane proteins unclassified 

1 MATTSSKPEG RPRGCAAPTI LLTKPPGATS RPTTAPPRTT TRRPPRPPGS 
5 51 SRKGAGNSSR PVPPAPGGHS RSKEGflRGRN PSSTPLGflKR PLGKIFfllYK 

101 GNFTGSVEPE PSTLTPRTPL UGYSSSP(2P<2 TVAATTVPSN TSUAPTTTSL 
151 GPAKDKPGLR RAAC3GGGSTF TSflGGTPDAT AASGAPVSPfl AAPVPSiJRPH 
EDI HGDPflDGPSH SDSULTVTPG TSRPLSTSSG VFTAATGPTP AAFDTSVSAP 
251 SflGIPflGAST TPflAPTHPSR VSESTISGAK EETVATLTMT DRVPSPLSTV 
10 301 VSTATGNFLN RLVPAGTUKP GTAGNISHVA EGDKP(2HRAT ICLSKMDIAW 
351 VILAISVPIS SCSVLLTVCC MKRKKKTANP ENNLSYUNNT ITMDYFNRHA 
i»Dl VELPREIflSL ETSEDflLSEP RSPANGDYRD TGMVLVNPFC <3ETLFVGND<2 
H51 VSEI 

15 

BLASTP hits 

No BLASTP hits available 

20 

Alert BLASTP hits for DKFZphamy5_BMcfii frame H 
No Alert BLASTP hits found 
25 Pedant information for DKFZphamy2_2Hca ■> frame 2 



Report for DKFZphamy2_2'4cfi .2 

30 

ELENGTH3 1b3 

emio ^fl277.a^ 

IpI3 T-flD 

IFUNCAT3 Tfl classification not yet clear-cut ES- cerevisiaei 

35 YJRlSlcH 2e-[m 

([BLOCKS] PR0CH12F 

IBL0CKS3 BPD3b1bF 

IKliO TRANSMEMBRANE 1 

CKliO L0U_C0MPLEXITY 15-55 '/. 

40 

SE<2 LSTGPAPAAMATTSSKPEGRPRGflAAPTILLTKPPGATSRPTTAPPRTTTRRPPRPPGSS 

SEG xxxxxxxxxxxxxxxxxxxxxxxxxxx 

PRD ccccchhhhhhhhcccccccccccccceeeeccccccccccccccccccccccccccccc 
45 MEM 

SE(J RKGAGNSSRPVPPAPGGHSRSKEG(2RGRNPSSTPLG(2KRPLGKIF<2IYKGNFTGSVEPEP 

SEG x 

PRD cccccccccccccccccccccccccccccccccccccccccceeeeeecccccccccccc 

50 MEM 

SEfl STLTPRTPLIi)GYSSSP<3P<2TVAATTVPSNTSIilAPTTTSLGPAKDKPGLRRAA<2GGGSTFT 

SEG xxxxxxxx 

PRD ccccccccccccccccccceeeeeecccccccccccccccccccccccceeecccccccc 

55 MEM 

SEfl SflGGTPDATAASGAPVSPflAAPVPSflRPHHGDPflDGPSHSDSULTVTPGTSRPLSTSSGV 
SEG xxxxx. - • • xxxxxxxxxxxxxxxxx -. 
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WO 01/98454 PCT/IB01/02050 

PRD ccccccccccccccccccccccccccccccccccccccccceeeeeccccccccccccce 

MEM 

SEfl FTAATGPTPAAFDTSVSAPS<2GIPflGASTTP(2APTHPSRVSESTISGAKEETVATLTMTD 

5 SEG xxxxxxxxxxxxxx 

PRD eeeeccccccccccccccccccccccccccccccccccceeeeeecccchhhhhhhcccc 

mem 

se(2 rvpsplstvvstatgnflnrlvpagtukpgtagnishvaegdkpflhraticlskmdiauv 

10 SEG 

PRD ccccccceeeeeccccccccccccccccccccccceeecccccccceeeccccchhhhhh 

mem ii 

SE<2 ILAISVPISSCSVLLTVCCMKRKKKTANPENNLSYUNNTITMDYFNRHAVELPREIflSLE 

15 SEG 

PRD hhhhccccccccceeeehhhhhhccccccccccccccccccccccccccccccchhhhhc 

nEn nnnnnrinririMririnnrjn 

SEfl TSED<2LSEPRSPANGDYRDTGMVLVNPFC<2ETLFVGND(2VSEI 

20 SEG 

PRD cccccccccccccccccccceeeeecccccceeeeeccccccc 

MEM 

25 (No Prosite data available for DKFZphamy2_24cfl - 2) 

(No Pfam data available for DKFZphamy2_2 l lcfi.2) 
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WO 01/98454 
DKFZphamy2_2 l 4kl5 



PCT/IB01/02050 



5 group: amygdala derived 

DKFZphamyE_2 l 4kl5 encodes a novel 271 amino acid protein with weak 
similarity to pecanex of Drosophila melanogaster. 

10 Pecanex is a maternal-effect neurogenic gene-i involved in 
differentiation processes in the developing central nervous 
system. DKFZphamy2_2'4kl5.p3 seems to be expressed ubiquitiously • 



The new protein can find application in studying the expression 
15 profile of amygdala-specific genes and as a new marker for 
amygdala cells- 



20 



similarity to pecanex (Drosophila melanogaster) 
probably complete cds- 



Sequenced by GBF 



25 Locus: unknown 

Insert length: myn bp 

Poly A stretch at pos. lMMSi polyadenylation signal at pos. 1121 



30 

1 AAGGAAAACA AGAGGACATG 

SI TCACATTCTC ACTTAGTATG 

101 GCCCAGTTCC AAAATGAAGG 

151 ACCAATTTGT TCTAAGGCAG 

35 201 TCAAATGTAC TGGAAGAAAT 

251 TGTTCATACA GTAATGACTT 

301 TGGCTCCTAG TCCTGGTCAT 

351 TGGTCTGTTG CTTTGGACTG 

H01 AGCACTGAAA GCATTCAGGT 

40 H51 GTTTAGGTCC AATAGAAGAC 

501 TATGAACGTG ACTGGTACAT 

551 AGCAATTTTA CAAGAAAAGC 

fe.01 ATATGGGAAT TTACACTGGG 

b51 CAAGTGGGAA AGTTAAATCC 

45 701 TTCATGGGAA TTACTTTATG 

751 TACAAGCTCA TCCACTACTT 

601 CCTCCCCTGG GATATCCGAT 

SSI GTATTAGAGC TCATTTTGAC 

101 TTTTCATCCT AAAAAAGTAA 

50 151 ACACCCCCAT TCAGATGCCT 

1001 TCTTAGCCAT CTTAATGGTT 

1051 AAGATAAAAG AACTATTTGG 

1101 GCTTTGGTAA ATGTGAGAAA 

1151 TGTAATGCTT TCTTAAATGT 

55 1201 AGCTATGTAA TTTTTACATA 

1251 TAGTGATAAT GGATCCTGTT 

1301 TGAAATATAA TTTATAGTAT 

1351 AATATCTGAA TGTTTTTGTA 



CCATATATTC CTCTCATGGA GTTCAGTTGT 
CTTACCCGCA GAGTGGAGGA CTAGCTGTAT 
AGATGAGCTC GTTATTTCCA GAAGACTGGT 
TTGGAATGTT ATCATTCAGA AGAGAAGGCC 
TGCCAAGGAC AAAGTTTTAA AAGACTTTTA 
GTTATTTTAG TTTATTTGGA ATAGACAATA 
ATATTGAGAG TTTACGGTGG TGTTTTGCCT 
GCTCACAGAA AAGCCAGAAC TGTTTCAACT 
ATACTCTGAA ACTAATGATT GATAAAGCAA 
TTTAGAGAAC TGATTAAGTA CCTTGAAGAA 
TGGTTTGGTA TCTGATGAAA AGTGGAAGGA 
CATACTTGTT TTCTCTGGGG TATGATTCTA 
AGAGTGCTTA GCCTTCAAGA ATTATTGATC 
TGAAGCTGTT AGAGGTCAGT GGGCCAATCT 
CCACAAACGA TGATGAAGAA CGTTATAGTA 
TTAAGAAATC TTACGGTACA AGCAGCAGAA 
TTATTCTTCA AAACCTCTCC ACATACATTT 
TGTAATGTCA TCAAATGCAA TGTTTTTATT 
CTGTGATTCT TGTAACTTGA GGACTTCTCC 
GAGAACAGCT AAGCTCCGTA AAGTTGGTTC 
CTAAAAAACA GCAAAAACAT CTTTATGTCT 
CCAATATTTG TGCCCTCTGG ACTTTAGTAG 
ACTTTTGTAG AATTATCATA TAATGAATTT 
GTTATAGGTG AATTGCCATA CAAAGTTAAC 
CTTAAGAGAT AAACATATCA GTGTTCTAAG 
GAAGGTTAAC ATAATGTGTA TATATTTGTT 
TTTCAAATGT GCTGATTTAT TTTGACATCT 
TCAAGTAGTT TGTTTTCATA GACTTCAATT 
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WO 01/98454 PCT/IB01/02050 

1M01 CATAAACTTT AAAAAACTTT TAATAAAATA TTTTCCTTCC TTTTCAAAAA 
mSl AAAAAAAAAA AAAA 



BLAST Results 



Entry ACOO?^ from database EMBLNEW: 

Homo sapiens clone M22_H_S-. DORKING DRAFT SEQUENCE-. 5 unordered 
10 pieces- 
Score = mibi P = O-Oe+DOn identities = flHD/6Sfi 
3 exons 



15 

Medline entries 

No Medline entry 

Peptide information for frame 3 

ORF from 18 bp to fiSH bpi peptide length: 271 
Category: similarity to known protein 
Classification: unclassified 

30 1 MPYIPLMEFS CSHSHLVCLP AEWRTSCMPS SKMKEMSSLF PEDWYflFVLR 

SI (JLECYHSEEK ASNVLEEIAK DKVLKDFYVH TVMTCYFSLF GIDNMAPSPG 

101 HILRVYGGVL PUSVALDULT EKPELFflLAL KAFRYTLKLM IDKASLGPIE 

151 DFRELIKYLE EYERDUYIGL VSDEKIilKEAI LflEKPYLFSL GYDSNMGIYT 

201 GRVLSLflELL IflVGKLNPEA VRGfiWANLSU ELLYATNDDE ERYSK3AHPL 

35 2S1 LLRNLTVtJAA EPPLGYPIYS SKPLHIHLY 



20 



25 



BLASTP hits 

40 

No BLASTP hits available 

Alert BLASTP hits for DKFZphamy2_2m<lS-. frame 3 
45 No Alert BLASTP hits found 

Pedant information for DKFZphamy2_2Mkl5i frame 3 

50 Report for DKFZphamy2_21kl5 . 3 



ELENGTHJ 2fi4 
CMU3 33Qt»b.31 
55 EpIJ 5.17 

EHOMOLJ TREMBL : AFDb7b0fi_ll gene: "BDS11 • 12 n % 

Caenorhabditis elegans cosmid BD511- Be-13 
EKIiU Alpha_Beta 



-154- 



WO 01/98454 



PCT/1B01/02050 



SEfl GK(3EDI1PYIPLI1EFSCSHSHLVCLPAE:iilRTSCI1PSSKnKEI1SSLFPE]>lilY(aFVLR(2LECY 

PRD ccccccccccceeeecccceeeeecccccccccccccccccccccccchhhhhhhhhhhh 

5 

SEA HSEEKASNVLEEIAKDKVLKDFYVHTVMTCYFSLFGIDNMAPSPGHILRVYGGVLPliJSVA 

PRD hhhhhhhhhhhhhhhhhhhhhhheeeeeeeeeeeecccccccccceeeeeeccccccccc 

SEfl LDULTEKPELFflLALKAFRYTLKLMIDKASLGPIEDFRELIKYLEEYERDUYIGLVSDEK 

10 PRD cchhhhhchhhhhhhhhhhhhhhhhhhhhcccccchhhhhhhhhhhhhhheeeeeccccc 

SEtJ UKEAIL(3EKPYLFSLGYDSNMGIYTGRVLSL(JELLI(2VGKLNPEAVRG(2UANLSUELLYA 

PRD hhhhhhhhcchhhhhhhhcccccchhhhhhhhhhhhheeeeechhhhhhhhhhhhheeee 

15 SE<3 TNDDEERYSI<3AHPLLLRNLTV<2AAEPPLGYPIYSSKPLHIHLY 

PRD cccccccccccchhhhhhhhhhhccccccccccccccccccccc 



20 



(No Prosite data available for DKFZphamyS_S1klS.3) 
(No Pfam data available for DKFZphamy5_E4kl5.3> 
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5 group: amygdala derived 

DKFZphamyE_Eal3 encodes a novel MID amino acid protein without 
similarity to known proteins- 

10 No informative BLAST resultsi No predictive prositen pfam or SCOP 
motif e - 

The new protein can find application in studying the expression 
profile of amygdala-specific genes- 

15 

putative protein 
perhaps complete cds- 

20 

Sequenced by MediGenomix 

Locus: /map="ltpl3-3" 

25 Insert length: E5ai bp 

Poly A stretch at pos- 25bE-i polyadenylation signal at pos- E5M5 



1 GTTCCTGAGG ACGTGCTACG GGGGCAGCTT CCTGGTACAC GAGTCGTTCC 

30 51 TCTACAAGCG GGAGAAGGCT GTCGGGGACA AGGTGTATTG GACCTGCCGG 

1D1 GACCACGGGC TGCACGGCTG CCGGAGCCGG GCCATCACCC AGGGACAGCG 

151 GGTGACTGTG ATGCGT6G6C ACTGCCACCA GCCCGATATG GAGGGCCTGG 

EDI AAGCCCGGCG GCAGCAGGAG AAGGCCGTGG AGACGCTGCA GGCTGGGCAG 

E51 GACGGCCCTG GGAGCCAAGT GGACACGCTG CTCCGAGGCG TGGATAGTTT 

35 3D1 GCTCTACCGC AGGGGTCCGG GTCCCCTGAC TCTCACCAGG CCTCGGCCCA 

351 GAAAGCGAGC AAAGGTCGAA GACCAGGAGC TGCCAACCCA GCCCGAGGCC 

MD1 CCAGACGAGC ACCAGGACAT GGACGCAGAC CCGGGAGGCC CTGAGTTCCT 

M51 GAAGACGCCC CTGGGGGGCA GCTTCCTGGT GTACGAGTCC TTCCTCTACC 

5Q1 GGCGGGAGAA GGCGGCTGGG GAGAAGGT.GT ATTGGACCTG CCGGGACCAG 

40 551 GCCCGCATGG GCTGCCGCAG CCGCGCCATC ACCCAGGGCC GACGGGTGAC 

LD1 TGTCATGCGT GGTCACTGCC ACCCGCCCGA CCTGGGAGGC CTGGAGGCCC 

b51 TGAGGCAGCG GGAGAAACGC CCCAACACGG CGCAGCGGGG GAGCCCAGGC 

701 GCTGGCCTCT CTTTCCAGTG GCTCTTCCGG ATCCTGCAGC TTTTGGGTCA 

751 TGCTCCTGTG CTGCTGTGCC CCTCAGGGTC CTCCTGCCTC CCGAGCCTCC 

45 flOl CTGCTCCACA TGGCCCCTGC CCAGCCCTCT CCATCCCTCT TGAAGGAGGC 

851 CCC6AGTTCC TGAAGACGCC CCTGGGGGGC AGCTTCCTGG TGTACGAGTC 

^□1 CTTCCTCTAC CGGCGGGAGA AGGCGGCCGG GGAGAAGGTG TATTGGACCT 

151 GCCGGGACCA GGCCCGCATG GGCTGCCGCA GCCGCGCCAT CACOCAGGGC 

1001 CGGCGGGTCA TGGTCATGCG CAGGCACTGC CACCCACCGG ACCTGGGCGG 

50 1051 CCTGGAGGCC CTGCGGCAGC GGGAGCACTT CCCCAACCTG GCGCAGTGGG 

1101 ACAGCCCAGA TCCTCTCCGG CCCCTGGAGT TCCTGAGGAC TTCCCTGGGG 

1151 GGCAGGTTCC TGGTGCACGA GTCCTTCCTC TACAGGAAGG AGAAGGCGGC 

1E01 TGGGGAGAAG GTGTACTGGA TGTGCCGGGA CCAGGCTCGG CTGGGCTGCC 

1ES1 GCAGCCGCGC CATAACCCAG GGCCACCGCA TCATGGTCAT GC6CAGCCAC 

55 1301 TGCCATCAGC CTGACCTGGC AGGCCTGGAG GCCTTGAGGC AACGGGAGCG 

1351 GCTCCCCACC ACGGCCCAGC AGGAGGACCC AGAAAAGATT CAAGTTCAGC 

1M01 TGT6CTTCAA GACGTGTTCT CCTGAAAGCC AGCAGATTTA TGGGGACATC 

mSl AAAGACGTCA GACTGGATGG CGAGTCCCAG TGAGGCGATG TGGGCAGAGG 
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1501 AGCTCCGAGC CGCCCACCCA 
1SS1 CATCCACCTA GGTTTGGCTT 
IbOl ATCGATGGTC TTCGCGTCTC 
IbSl ATGGTGTCCT CATGTCGGCG 
5 17D1 ACGCAGCTGT CGTGGGGCAG 
17S1 CATGACAAAG CTGCCTGGAC 
1601 CCCTGGGTTT GCAGAGCACG 
1651 GCCCCGCTCT GCTCAGCACG 
1101 TGGGCACGTT TGGGGAAGTT 

10 1151 CCAGGTCAAC CCACACCAAT 
aOOl CTGGTCTCTG GCCGCCTGCT 
E0S1 GCCGGCCAAA CCAGAGGCCT 
2101 CTCCTTCCAA GTTAAATTAA 
E151 TCATTCCCGA GATGGGAGCC 

15 2201 GCACACGTGC CCTGGCTGAG 
2251 TCCTGGGCAG CAAAGGCGTG 
2301 TGGCTTCACC AGTCAGAGGG 
2351 GGGACTGCAG AGCCTCCTCC 
2M01 GGTCCTTCAC TCCCACCCTG 

20 2M51 TGTCCCCTGG CAAGTTGGCC 
2501 ACAGCCTGGG CACCCCTGCT 
2551 ATCCTATTTT CCATCAAAAA 



AGGTGGCTTC 
AGCAGAAACT 
CTCAGGAGGT 
GAGAACAGTG 
GGCGGTGGCG 
ACGGACGCCC 
CAGCCTTCCT 
GTGCAAAGTG 
CCTGCTTCAA 
CTTTTCTGGA 
GCCAGGGTGT 
CGCTCCGCAC 
ACCCCCTCTC 
AGTCCAG6GG 
GCCAGCGGCA 
TCCCCTTCTG 
AGCAGTCCGG 
TTACTAACAA 
TAATTGTGGG 
ACGGAACCCA 
TCTCCTCTGC 
AAAAAAAAAA 



ACATCCACAC 
TCTTTTCATT 
CTCCCAGGAG 
CTCAGAGCTG 
CCTTCCTGAC 
CTGCTGTACG 
AGGGCTTTCC 
AATGCTGCTG 
ACTGAGCTGC 
CAGGTGCTGG 
GGCCATCCCC 
TCCACACTTT 
CACGATTCCC 
TCAGCAGGAG 
TCCTGGGTGG 
TCAGACAGCT 
AGAGGCAAGA 
GGACCTGTCC 
GGGAGTGCCA 
CCATGCACTG 
TTGTACGGTT 
AAAA 



AGGCACTTCC 
CTTCCAAAGC 
GAATTCTTGG 
GCGCTTGCAG 
CTTTGGAAGA 
GCCACAGCAC 
ACCTGGCGAG 
TCTTGGAGCC 
CCCGCATAGG 
GTAGGCCTTC 
AGCAACCGGA 
CCTTTCTGTG 
ACGGCAGGCG 
CCAGCGCTGG 
CCCAGGTCCA 
TCACAGAGTG 
TGACCCCACC 
GCAGCCGCGA 
GCAACAGGCC 
CAAGGCTGT6 
CCCCCAATAA 



25 



BLAST Results 



30 



No BLAST result 



Medline entries 



35 



No Medline entry 



40 



Peptide information for frame 2 



ORF from Itl bp to mflO bpn peptide length: i^D 
Category: putative protein 
Classification: no clue 



45 1 URGHCHflPDM 

51 RGPGPLTLTR 
101 LGGSFLVYES 
151 GHCHPPDLGG 
201 LLCPSGSSCL 

50 251 RREKAAGEKV 
301 LRtJREHFPNL 
351 VYUMCRDflAR 
M01 TAfldEDPEKI 



EGLEARRfiflE 
PRPRKRAKVE 
FLYRREKAAG 
LEALRflREKR 
PSLPAPHGPC 
YUTCRDflARil 
AflUDSPDPLR 
LGCRSRAIT<3 
(3VC3LCFKTCS 



KAVETLfiAGfl 
DfiELPTflPEA 
EKVYUTCRDfl 
PNTAflRGSPG 
PALSIPLEGG 
GCRSRAITflG 
PLEFLRTSLG 
GHRIilVtlRSH 
PES(2<2IYGDI 



DGPGSflVDTL 
PDEHfiDflDAD 
ARMGCRSRAI 
AGLSFflWLFR 
PEFLKTPLGG 
RRVMVMRRHC 
GRFLVHESFL 
CHflPDLAGLE 
KDVRLDGESfl 



LRGVDSLLYR 
PGGPEFLKTP 
TflGRRVTVUR 
ILflLLGHAPV 
SFLVYESFLY 
HPPDLGGLEA 
YRKEKAAGEK 
ALRdJRERLPT 



55 



BLASTP hits 
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WO 01/98454 PCT7IB01/02050 
No BLASTP hits available 

Alert BLASTP hits for DKFZphamy2_2al3-. frame 2 

5 No Alert BLASTP hits found 

Pedant information for DKFZphamy2_2al3i frame 2 

10 Report for DKFZphamy2_2al3 - 2 

[LENGTH! 

Mid J 55BM0-13 

15 Cpll T.33 

IKU3 Alpha_Beta 

IKhO L0U_C0I1PLEXITY b.21* 

20 SEfl FLRTCYGGSFLVHESFLYKREKAVGDKVYUTCRDHALHGCRSRAITflGflRVTVriRGHCHfl 

SEG 

PRD ccccccccceeeccchhhhhhhhhccceeeeecccccccccceeeeccceeeeeeccccc 

SEfl PDflEGLEARRflflEKAVETLflAGflDGPGSflVDTLLRGVDSLLYRRGPGPLTLTRPRPRKRA 

25 SEG xxxxxxxxxxxxxxx ■ • - 

PRD cccchhhhhhhhhhhhhhhhhcccccccccccccccccceeeeecccceeecccccchhh 

SEfl KVEDflELPTflPEAPDEHflPMDADPGGPEFLKTPLGGSFLVYESFLYRREKAAGEKVYUTC 

SEG 

30 PRD hhhhhcccccccccccccccccccccccccccccccceeeehhhhhhhhhhhccceeeec 

SEfl RDflARIIGCRSRAITflGRRVTVIIRGHCHPPDLGGLEALRflREKRPNTAflRGSPGAGLSFflU 

SEG 

PRD cchhhhhccceeecccceeeeeecccccccccchhhhhhhhhccccccccccccchhhhh. 

35 

SEfl LFRILflLLGHAPVLLCPSGSSCLPSLPAPHGPCPALSIPLEGGPEFLKTPLGGSFLVYES 

SEG xxxxxxxxxxxxxxxx 

PRD hhhhhhhhhccceeeccccccccccccccccccccccccccccccccccccccceeeehh 

40 SEfl FLYRREKAAGEKVYUTCRDflARMGCRSRAITflGRRVMVHRRHCHPPDLGGLEALRflREHF 

SEG 

PRD hhhhhhhhhccceeeeccchhhhhccceeecccceeeeeecccccccccchhhhhhhhhc 

SEfl PNLAflUDSPDPLRPLEFLRTSLGGRFLVHESFLYRKEKAAGEKVYUIICRDflARLGCRSRA 

45 SEG 

PRD ccccccccccccchhhhhhhcccceeeeecchhhhhhhhccceeeecchhhhhhhccccc 

SEfl ITflGHRIMVnRSHCHflPDLAGLEALRflRERLPTTAflflEDPEKIflVflLCFKTCSPESflfllY 

SEG 

50 PRD ccccceeeeeeccccccccchhhhhhhhhhhhhccccccccceeehhhhhcccccccccc 

SEfl GDIKDVRLDGESfl 

SEG 

PRD ccccccccccccc 



55 



(No Prosite data available for DKFZphamy2_2al3-2) 
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5 group: differentiation/development 

DKFZphamy2_2bn encodes a novel 76=1 amino acid protein which 
originates 

from TXBP1S1 mRNA by alternative splicing. 

10 

It is ubiquitously expressed. The mRNA is also subject to 
alternative polyadenylation . Overexpression of TXBP151 in NIH3T3 
cells causes inhibition of 

apoptosis induced by tumour necrosis factor (TNF). It binds to 
15 A2Di which is 

also an inhibitor of cell death by a yet unknown mechanism* 

The new protein can find application in modifying/blocking 
apoptosic pathways and therefore serve as a tool in diagnosis of 
20 cancer predisposition and as a tool in cell culture* 



TXBP151-. differentially spliced 

25 differential splicing 

differential polyadenylation 

Sequenced by flediGenomix 

30 Locus: /map="7pl5" 

Insert length: 302fl bp 

Poly A stretch at pos. 2flflSi polyadenylation signal at pos. 2flbfl 

35 

1 GAAGAGGTTC GGCGGCTGAT GGCGGATCAG GATCGGAAGC CTGCGTAACT 

51 TTCTCCCTTG ATCCGGGAGT CTTTCCACTG GATTCACAAT GACATCCTTT 

1D1 CAAGAAGTCC CATTGCAGAC TTCCAACTTT GCCCATGTCA TCTTTCAAAA 

151 TGTGGCCAAG AGTTACCTTC CTAATGCACA CCTGGAAT6T CATTACACCT 

40 201 TAACTCCATA TATTCATCCA CATCCAAAAG ATTGGGTTGG TATATTCAAG 

251 GTTGGATGGA GTACTGCTCG TGATTATTAC ACGTTTTTAT GGTCCCCTAT 

301 GCCTGAACAT TATGTGGAAG GATCAACAGT CAATTGTGTA CTAGCATTCC 

351 AAGGATATTA CCTTCCAAAT GATGATGGAG AATTTTATCA GTTCTGTTAC 

M01 GTTACCCATA AGGGTGAAAT TCGTGGAGCA AGTACACCTT TCCAGTTTCG 

45 MSI AGCTTCTTCT CCAGTTGAAG AGCTGCTTAC TATGGAAGAT GAAGGAAATT 

SD1 CTGACATGTT AGTGGTGACC ACAAAAGCAG GCCTTCTTGA GTTGAAAATT 

551 GAGAAAACCA TGAAAGAAAA AGAAGAACTG TTAAAGTTAA TTGCCGTTCT 

t.01 GGAAAAAGAA ACAGCACAAC TTCGAGAACA AGTTGGGAGA ATGGAAAGAG 

b51 AACTTAACCA TGAGAAAGAA AGATGTGACC AACTGCAAGC AGAACAAAAG 

50 701 GGTCTTACTG AA6TAACACA AAGCTTAAAA ATGGAAAATG AAGAGTTTAA 

751 GAAGAGGTTC AGTGATGCTA CATCCAAAGC CCATCAGCTT GAG6AAGATA 

301 TTGTGTCAGT AACACATAAA GCAATTGAAA AAGAAACCGA ATTAGACAGT 

851 TTAAAGGACA AACTCAAGAA GGCACAACAT GAAAGAGAAC AACTTGAATG 

=iDl TCAGTTGAAG ACAGAGAAGG ATGAAAAGGA ACTTTATAAG GTACATTTGA 

55 =151 AGAATACAGA AATAGAAAAT ACCAAGCTTA TGTCAGAGGT CCAGACTTTA 

1001 AAAAATTTAG ATGGGAACAA AGAAAGCGTG ATTACTCATT TCAAAGAAGA 

1051 GATTGGCAGG CTGCAGTTAT GTTTGGCTGA AAAGGAAAAT CTGCAAAGAA 

1101 CTTTCCTGCT TACAACCTCA AGTAAAGAAG ATACTTGTTT TTTAAAGGAG 
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1151 CAACTTCGTA AAGCAGAGGA ACAGGTTCAG GCAACTCGGC AAGAAGTTGT 
1ED1 CTTTCTGGCT AAAGAACTCA GTGATGCTGT CAACGTACGA GACAGAACGA 
1251 TGGCAGACCT GCATACTGCA CGCTTGGAAA ACGAGAAAGT GAAAAAGCAG 
1301 TTAGCTGATG CAGTGGCAGA ACTTAAACTA AATGCTATGA AAAAAGATCA 
5 1351 GGACAAGACT GATACACTGG AACACGAACT AAGAAGAGAA GTTGAAGATC 
mOl TGAAACTCCG TCTTCAGATG GCTGCAGACC ATTATAAAGA AAAATTTAAG 
m51 GAATGCCAAA GGCTCCAAAA ACAAATAAAC AAACTTTCAG ATCAATCAGC 
1501 TAATAATAAT AATGTCTTCA CAAAGAAAAC GGGGAATCAG CAGAAAGTGA 
1551 ATGATGCTTC AGTAAACACA GACCCAGCCA CTTCTGCCTC TACTGTAGAT 

10 IbOl GTAAAGCCAT CACCTTCTGC AGCAGAGGCA GATTTTGACA TAGTAACAAA 
lt.51 GGGGCAAGTC TGTGAAATGA CCAAAGAAAT TGCTGACAAA ACAGAAAAGT 
1701 ATAATAAATG TAAACAACTC TTGCAGGATG AGAAAGCAAA ATGCAATAAA 
1751 TATGCTGATG AACTTGCAAA AATGGAGCTG AAATGGAAAG AACAAGTGAA 
1601 AATTGCTGAA AATGTAAAAC TTGAACTAGC TGAAGTACAG GACAATTATA 

15 1651 AAGAACTTAA AAGGAGTCTA GAAAATCCAG CAGAAAGGAA AATGGAAGGT 
nOl CAGAATTCCC AGAGTCCTCA ATGTTTCAAA ACATGCTCAG AGCAAAATGG 
n51 TTATGTTCTC ACATTGTCAA ATGCACAACC AGTTCTGCAA TATGGTAATC 
2001 CTTATGCATC TCAGGAAACA AGAGATGGAG CAGATGGTGC TTTTTACCCA 
2051 GATGAAATAC AAAGGCCACC TGTCAGAGTC CCCTCTTGGG GACTGGAAGA 

20 2101 CAATGTTGTC TGCAGCCAGC CTGCTCGAAA CTTTAGTCGG CCTGATGGCT 
2151 TAGAGGACTC TGAGGATAGC AAAGAAGATG AGAATGTGCC TACTGCTCCT 
22D1 GATCCTCCAA GTCAACATTT ACGTGGGCAT GGGACAGGCT TTTGCTTTGA 
2251 TTCCAGCTTT GATGTTCACA AGAAGTGTCC CCTCTGTGAG TTAATGTTTC 
2301 CTCCTAACTA TGATCAGAGC AAATTTGAAG AACATGTTGA AAGTCACTGG 

25 2351 AAGGTGTGCC CGATGTGCAG CGAGCAGTTC CCTCCTGACT ATGACCAGCA 
SHQ1 GGTGTTTGAA AGGCATGTGC AGACCCATTT TGATCAGAAT GTTCTAAATT 
2M51 TTGACTAGTT ACTTTTTATT ATGAGTTAAT ATAGTTTAGC AGTAAAAAAA 
2501 AAAAAAAAAA ACCACACCTA AAATAGACCA CTGAGGAGAC CATAGAGCGG 
2551 ATGCTTTCAT GCACCCTTTA CTGCACTTTC TGACCAGGAG CTACTTTGAG 

30 2L01 TTTGGTGTTA CTAGGATCAG GGTCA6TCTT T6GCTTATCA ATAAATTTTA 
2h51 ATCTCTGTTA ATCTTACCTG CTTTAAAAAA AAGTTCTTGT GTGTTCGTAT 
2701 CTTTATTTAT TCCCTAGTTT GCAGAACTGT CTGAATAAAG GATACAAGGA 
2751 TTATTTCAAT GTTACTGCAC TGAAAAACGT GTATGTATTA GTGTGCTAGA 
2301 TTATTTAGCA GAATATTCAC AAGTTTCTGT TGACCTTGTT GATTGAGCAT 

35 2351 GACTACTAAA TATTATGTAA TAAAAAGCAT TTGTCATAAC AAAAAAAAAA 
2^01 AAAAAAAAAA AAAAAAAAAA AAAAAAAAAA AAAAAAAAAA AAAAAAAAAA 
2T51 AAAAAAAAAA AAAAAAAAAA AAAAAAAAAA AAAAAAAAAA AAAAAAAAAA 
3001 AAAAAAAAAA AAAAAAAAAA AAAAAAAA 

40 

BLAST Results 



No BLAST result 

45 



Medline entries 



50 TlBbnaM: 

De Valck Jin DY-i Heyninck Ki Van de Craen Contreras Ri 
Fiers lili 

Jeang KT Beyaert R-i The zinc finger protein A20 interacts with 
a 

55 novel 

anti-apoptotic protein which is cleaved by specific caspases- 
Oncogene 

l=m Jul 22,13(2^)^132-10 
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Peptide information for frame 2 



ORF from &*\ bp to EMS5 bp} peptide length: 73^ 
Category: known protein 
10 Classification: Cell division 



1 PITSF(3EVPL<3 TSNFAHVIFfl 

51 GIFKVGWSTA RDYYTFLUSP 

101 (3FCYVTHKGE IRGASTPFC2F 

15 151 ELKIEKTMKE KEELLKLIAV 

HOI AEfiKGLTEVT (2SLKI1ENEEF 

551 ELDSLKDKLK KA(3HERE(3LE 

301 VflTLKNLDGN KESVITHFKE 

351 FLKEflLRKAE EfiVfiATRtJEV 

20 401 VKKflLADAVA ELKLNAMKKD 

451 EKFKEC(2RL<2 KfilNKLSDlSS 

501 STVDVKPSPS AAEADFDIVT 

551 KCNKYADELA KMELKIilKEfiV 

bOl KHEGt3NS(3SP (2CFKTCSE(2N 

25 bSl AFYPDEIflRP PVRVPSUGLE 

701 PTAPDPPSfiH LRGHGTGFCF 

751 ESHUKVCPMC SEflFPPDYDd 



NVAKSYLPNA HLECHYTLTP YIHPHPOUV 

MPEHYVEGST VNCVLAFflGY YLPNDDGEFY 

RASSPVEELL TMEDEGNSDM LVVTTKAGLL 

LEKETAflLRE UVGRHERELN HEKERCDCJLfl 

KKRFSDATSK AHflLEEDIVS VTHKAIEKET 

CdLKTEKDEK ELYKVHLKNT EIENTKLflSE 

EIGRLflLCLA EKENLfiRTFL LTTSSKEDTC 

VFLAKELSDA VNVRDRTMAD LHTARLENEK 

(2DKTDTLEHE LRREVEDLKL RLflilAADHYK 

ANNNNVFTKK TGNUOKVNDA SVNTDPATSA 

KGflVCEMTKE IADKTEKYNK CKflLLflDEKA 

KIAENVKLEL AEVCJDNYKEL KRSLENPAER 

GYVLTLSNAfl PVLflYGNPYA SflETRDGADG 

DNVVCSfiPAR NFSRPDGLED SEDSKEDENV 

DSSFDVHKKC PLCELUFPPN YDflSKFEEHV 
tJVFERHVflTH FDdNVLNFD 



30 

BLASTP hits 

No BLASTP hits available 

35 Alert BLASTP hits for DKFZphamy2_2bn ■. frame S 

TREHBL:HS33fi211_l product: "taxi-binding protein TXBP151"i Homo 
sapiens taxi-binding protein TXBP151 mRNAi complete cds-i N = Ei 
Score 
40 = STMfl-, P = 0 



>TREMBL:HS33fl211_l product: "taxi-binding protein TXBP151"i Homo 
sapiens 

45 taxi-binding protein TXBP151 mRNAi complete cds- 

Length = 7M7 

HSPs: 

50 Score = 2146 (442-3 bits)-. Expect = O-Oe+DO-. Sum P(2) = O-Oe+00 
Identities = S75/L03 (15X)-, Positives = 57b/b03 (ISZ) 

fluery: 1 

MTSFl2EVPL(2TSNFAHVlF(2NVAKSYLPNAHLECHYTLTPYIHPHPKDIi)VGIFKvGlilSTA bO 

55 

nTSFflEVPLUTSNFAHVIFiSNVAKSYLPNAHLECHYTLTPYIHPHPKDIiJVGIFKVGUSTA 
Sbjct: 1 

HTSFtfEVPLflTSNFAHVIFflNVAKSYLPNAHLECHYTLTPYIHPHPKDUVGIFKVGUSTA bO 
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duery: bl 

RDYYTFLWSPIIPEHYVEGSTVNCVLAFflGYYLPNDDGEFYfiFCYVTHKGEIRGASTPFfiF 120 

5 RDYYTFLli)SPMPEHYVEGSTVNCVLAF(2GYYLPNI>l>GEFY<3FCYVTHK6EIRGASTPF<2F 
Sbjct: bl 

RI>YYTFLUSPnPEHYVEGSTVNCVLAFl2GYYLPN]>I>GEFY(2FCYVTHKGEIRGASTPF(3F 120 
(Juery: 121 

10 RASSPVEELLTnEDEGNSDMLVVTTKAGXXXXXXXXXXXXXXXXXXXXXXXXXXTAiaLRE IflQ 

RASSPVEELLTtlEDEGNSDIILVVTTKAG 

TAtJLRE 

Sbjct: 121 

RASSPVEELLTMEDEGNSDNLVVTTKAGLLELKIEKTMKEKEELLKLIAVLEKETAflLRE 180 

15 

Query: Ifil 

(2VGRMERELNHEKERCD(3Lc2AEc2KGLTEVT<2SLKMENEEFKKRFS]>ATSKAH(2LEE]>IVS 240 
<3VGRMERELNHEKERCDi3Li2AE<2KGLTEVT<3SLKMENEEFKKRFS]>ATSKAH 

+EEDIVS 
20 Sbjct: Ifll 

(3VGRI1ERELNHEKERCI>(3L(3AE(3KGLTEVT(aSLKMENEEFKKRFSl>ATSKAHHVEEDIVS 240 

(2uery: 241 

• VTHKAIEKETELDSLKDKL<KA(3HERE(2LEC(3LKTEKDEKELYKVHLKNTEIENTKLI1SE 300 

25 

VTHKAIEKETELDSLKDKLKKA(3HERE(2LEC(2LKTEKDEKELYKVHLKNTEIENTKLnSE 
Sbjct: 2M1 

VTHKAIEKETELDSLKDKLKKAUHEREflLECflLKTEKDEKELYKVHLKNTEIENTKLIISE 300 

30 fluery: 3D1 

V(3TLKNLDGNKESVITHFKEEIGRLl2LCLAEKENL(2RTFLLTTSSKEDTCFLKEl3LRKAE 3b0 

VflTLKNLDGNKESVITHFKEEIGRLflLCLAEKENLflRTFLLTTSSKEDTCFLKEflLRKAE 
Sbjct: 3D1 

35 V(3TLKNLDGNKESVITHFKEEIGRLfiLCLAEKENL<2RTFLLTTSSKEDTCFLKEl3LRKAE 3b0 
(Suery: 3bl 

Ei3VflATR(3EVVFLAKELSDAVNVRDRTriADLHTARLENEKVKK(2LADAVAELKLNAriKK]> 420 

40 EfiV(2ATR(3EVVFLAKELSl>AVNVRDRTHADLHTARLENEKVKKflLA]>AVAELKLNAnKK]> 
Sbjct: 3bl 

Et2VCATR(2EVVFLAKELS])AVNVRI>RTriADLHTARLENEKVKK(2LA]>AVAELKLNAI1KKI> M20 
fiuery: 421 

45 (3]>KTI>TLEHELRREVEDLKLRL(2riAADHYKEKFKEC(2RLt3K(2INKLSl>(JSANNNNVFTKK 4flO 

(2DKTDTLEHELRREVEDLKLRL(2I1AADHYKEKFKEC<3RL(2IC(JINKLS1>(2SANNNNVFTKK 
Sbjct: 421 

<3DKTI>TLEHELRREVEI>LKLRL(2l1AAI>HYKEKFKEC(2RL(3KflINKLSI>(JSANNNNVFTKK HAD 

50 

(Juery: 4B1 

TGNfiflKVNDASVNTDPATSASTVDVKPSPSAAEADFDIVTKGflVCEMTKEIADKTEKYNK 540 

TGNdfiKVNDASVNTDPATSASTVDVKPSPSAAEADFDIVTKGflVCEMTKEIADKTEKYNK 
55 Sbjct: iifil 

TGNflflKVNDASVNTDPATSASTVDVKPSPSAAEADFDIVTKGlJVCEMTKEIADKTEKYNK 540 
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fluery: 511 

CK(3LLflI>E<AKCNKYAI>ELAKMELKIiJKE(3VKIAENVKLELAEV(3]>NyKELKRSLENPAER bDD 

CK(aLLflDEKAKCNKYAPELAKMELKUKE(3VKIAENVKLELAEVt2I>NYKELKRSLENPAER 
5 Sbjct: 541 

CK(3LL(3I>EKAKCNKYADELAKI1ELKlilKE(2VKIAENVKLELAEVi33>NYKELKRSLENPAER bOD 

fluery: bOl KME bD3 
KI1E 

10 Sbjct: blDl KME b03 

Score = A31 (124-7 bits)-. Expect = D-De+DD-, Sum P(2) = D.Oe+DO 
Identities = 147/153 (lb*)-. Positives = 141/153 (17X) 

15 fiuery: b37 

NPYASflETRDGADGAFYPDEIfiRPPVRVPSIilGLEDNVVCSflPARNFSRPDGLEDSEDSKE bib 
NP A ++ 

DGADGAFYPDEIflRPPVRVPSUGLEDNVVCSfiPARNFSRPDGLEDSEDSKE 
Sbjct: Sib NP- 

20 AERKnEDGADGAFYPDEIflRPPVRVPSUGLEDNVVCSflPARNFSRPDGLEDSEDSKE b54 
fluery: b17 

DENVPTAPDPPSflHLRGHGTGFCFDSSFDVHKKCPLCELIIFPPNYlXaSKFEEHVESHIilKV 75b 

25 DENVPTAPDPPSflHLRGHGTGFCFDSSFDVHKKCPLCELMFPPNYDflSKFEEHVESHlilKV 
Sbjct: bSS 

DENVPTAPDPPSdHLRGHGTGFCFDSSFDVHKKCPLCELNFPPNYDfiSKFEEHVESHliJKV 71N 

fluery: 757 CPflCSEflFPPDYDJjflVFERHVflTHFDflNVLNFD 7fl1 
30 CPnCSElJFPPBYDflflVFERHVflTHFPflNVLNFD 

Sbjct: 715 CPMCSE(3FPPDYDlJl3VFERHV(2THFDi2NVLNFD 747 

Score = 104 (15. b bits)-. Expect = 1.2e-02-> Sum P(2) = fl.fle-02 
Identities = flD/351 (22/C) -. Positives = 157/351 (44JO 

35 

fluery: 177 (JLR E<3VGRMERELNH- 

EKERCDdLflAEdKGLTEVTtJSLKHENEEFKKRFSDATSKAH 232 

(JLR EflV +E+ KE D + + + ++ + ++ENE+ KK+ 

+ DA 

40 Sbjct: 355 ALRKAEElSVCiATRflEv'VFLAKELSDAVNVRDRTnADL- 
HTARLENEKVKKflLADA 4Dfl 

fluery: 233 (3LEEDIVSVTHKAIEKETE- 
LDSLKDKLKKA<2HEREflLEC(2LKTEKDEKELYKVHLKNTE 211 
45 + + A++K+ + D+L+ +L++ E E L+ +L+ D 

YK K + 

Sbjct: 4D1 VAELKLNAMKKDflDKTDTLEHELRR EVEDLKLRLflNAADH 

YKEKFKECfl 457 

50 fluery: 212 

IENTKLMSEVflTLKNLDGNKESVITHFKEEIGRLflLCLAEKENLtlJRTFLLTTSSKEDTCF 351 

+L ++ L + N +VT ++ G a N T 

T++S D 

Sbjct: 456 RLflKfllNKLSDdSANNNNVFT KKTGNt2<2KVNDASVN — - 

55 TDPATSASTVD— 504 

fluery: 352 LKEULRKAEEflVU- 

ATRflEVVFLAKELSDAVNVRDRTMADLHTARLENEKVKKflLADAVA 410 
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+K AE T+ +V + KE++D ++ L + + K 

+LA 

Sbjct: SOS 

V<PSPSAAEADFDIVTKGt3VCEHTKEIADKTEKYNKCK(2LLflI>EKAKCNKYAl>ELAKf1EL 5b1 

5 

fiuery: mi ELKLNAMKKDflDKTDTLE HELRREVED-LKLRLflMAAD— 

HYKEKFKEC<2-RL(2K Ibl 

+ K + K + E EL+R +E+ + +++ AD Y ++ + 

R + 

10 Sbjct: 5b5 

KWKEflVKIAENVKLELAEVCDNYKELKRSLENPAERKMEDGADGAFYPDEIflRPPVRVPS bEI 

fluery: IbS — fllNKLSDflSANNNNVFTKKTG--- 
NflflKVNDASVNTDPATSASTVDVKPSPSAAEAD S15 
15 + N + 0 A N F++ G ++ D +V T P + + 

+ ++ 

Sbjct: bE5 UGLEDNVVCSflPARN--- 
FSRPDGLEDSEDSKEDENVPTAPDPPSflHLRGHGTGFCFDSS bfll 

20 fluery: Sit, FDIVTKGflVCEfl SE7 

FD+ K +CE+ 
Sbjct: bflE FDVHKKCPLCEL 

25 Pedant information for DKFZphamy2_Ebn, frame E 

Report for DKFZphamyE_Ebn . E 

30 

[LENGTH! 7AT 

EMU) J ^0677-147 

ICpIl 5-30 

EHOHOLJ TREMBL : HS33flEll_l product: "taxi-binding protein 

35 TXBPlSl"i Homo sapiens taxi-binding protein TXBP151 mRNA-i 
complete cds- □•□ 

EFUNCAT2 11 unclassified proteins IS. cerevisiae-i YORElbcl 

3e-m 

EFUNCATJ Ofl-07 vesicular transport (golgi network, etc) ES- 
40 cerevisiae, YDLOSfiuO Ee-13 

EFUNCAT3 30-03 organization of cytoplasm ES- cerevisiae, 
YDLDSflwl Se-13 

EFUNCAT1 OLIO nuclear biogenesis ES. cerevisiae-i Y»R3Sbw3 

He-13 

45 EFUNCAT3 30-01 organization of cytoskeleton ES- cerevisiae, 

YDR35bw] 1e-13 

EFUNCAT3 D3-EE cell cycle control and mitosis ES- cerevisiae, 
YDR3Sbu]l Me-13 

EFUNCAT3 11-01 dna repair (direct repair, base excision repair 
50 and nucleotide excision repair) ES. cerevisiae-i YKROTSwl 7e-lE 
EFUNCAT3 30-10 nuclear organization ES- cerevisiae-i YKRO^SwiB 
7e-lS 

EFUNCAT3 03-ES cytokinesis ES- cerevisiae, YHR0E3w I1Y01 - 
myosin-1 isoformJ be-ll 
55 EFUNCAT3 0fl-EE cytoskeleton-dependent transport ES- cerevisiae, 
YHR0S3w HY01 - mybsin-1 isoform]l be-ll 

EFUNCAT3 03-01 budding, cell polarity and filament formation 
ES- cerevisiae, YHR0E3w MY01 - myosin-1 isoformJ be-ll 
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CFUNCAT3 1 genome replication-* transcription! recombination and 
repair CM - jannaschiii IU13ES3 3e-0fl 

EFUNCAT3 Tfi classification not yet clear-cut CS- cerevisiaei 

YJR134c]) Me-Ofl 

5 EFUNCAT3 03. n recombination and dna repair IS- cerevisiaei 

YNLS50w3 Ee-07 

EFUNCAT3 03.13 meiosis IS- cerevisiaei YNLE50w3 Ee-D7 
CFUNCAT3 03-01 cell growth IS- cerevisiaei YNL07Tc3 Se-Ob 
[FUNCAT3 03-07 pheromone response! mating-type determination! 
10 sex-specific proteins ICS- cerevisiae-i ^YNL07Tc3 Ee-Ob 

EFUNCAT3 Qfl-TT other intracellular-transport activities IS- 
cerevisiaen YNLQ71C3 Se-Ob 

EFUNCAT3 OT-13 biogenesis of chromosome structure ES- 
cerevisiaei YLRDflbw3 5e-0b 
15 EFUNCAT3 11-01 stress response ES. cerevisiaei YPRlMlc3 Ee-OS 
EFUNCAT3 0b. 10 assembly of protein complexes ICS- cerevisiaei 

YPRimcI Ee-OS 

IFUNCAT3 03-SE.01 cell cycle check point proteins IS- 
cerevisiaei YGL0Sbw3 Ee-OS 
20 CFUNCAT3 30-05 organization of centrosome IS- cerevisiaei 
YPR141c3 Ee-OS 

[FUNCAT3 Ofi-lb extracellular transport IS - cerevisiaei 

Y0R3Ebw3 le-O^t 

EFUNCAT3 OT-ES vacuolar and lysosomal biogenesis IS. 
25 cerevisiaei Y0R3Ebw3 le-0 1 * 

CFUNCAT3 3D. lb mitochondrial organization IS- cerevisiae-i 
YAL011w3 Ee-OH 

EFUNCAT3 Ob- 07 protein modification ( glycolsylat ion-i acylationi 
myristylationi palmitylation-i f arnesylation and processing) 
30 IS- cerevisiae-. YKLSOlcJ Se-OM 

IFUNCAT3 e amino acid metabolism and transport EM- genitaliumi 
MG04E3 Me-QM 

EFUNCAT3 30-13 organization of chromosome structure IS- 
cerevisiaei YDREflSwJ 7e-04 
35 EFUNCAT3 n secretion and adhesion CM. jannaschiii MJOE^ll 

0-001 

EFUNCAT3 05. 0M translation (initiationi elongation and 
termination) IS- cerevisiaei YAL035w3 0-001 
IBL0CKS3 BL003EbD Tropomyosins proteins 
40 CBL0CKS3 PR0Q5M5E 
[BLOCKS! PR000M1F 

ESC0P3 dEtmab_ 1.105-4.1-1 Tropomyosin Crabbit 

(Oryctolagus cuniculus) Se-DS 

EEC! 3-b.l-3E Myosin ATPase 5e-lb 

45 CPIRKU3 nucleus Ee-3S 

[PIRKU3 phosphotransferase Se-10 

EPIRKW3 duplication Ee-OT 

EPIRKU3 citrulline 76-0=1 

EPIRKU3 tandem repeat Ee-13 

50 EPIRKU3 heterodimer 2e-0fl 

EPIRKU3 heart Be-11 

EPIRKIO endocytosis 3e-10 

EPIRKliD polymorphism le-OT 

EPIRKW3 transmembrane protein be-lE 

55 EPIRKIO serine/threonine-specif ic protein kinase Se-10 

CPIRKIO cell wall 7e-01 

EPIRKW3 zinc finger 3e-10 

EPIRKU3 surface antigen be-Ofl 
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IPIRKliD DNA binding be-lE 

IPIRKliD metal binding 3e-10 

IPIRKliD muscle contraction Se-13 

IPIRKliD brain fle-Dfl 

5 IPIRKliD acetylated amino end Me-DT 

IPIRKliD actin binding Se-lb 

IPIRKliD endoplasmic reticulum He-CH 

IPIRKliD mitosis 3e-15 

IPIRKliD microtubule binding 3e-15 

10 IPIRKU3 ATP Se-lb 

IPIRKliD chromosomal protein 2e-0fl 

IPIRKliD receptor 4e-lD 

IPIRKliD thick filament Se-13 

IPIRKliD phosphoprotein Se-lb 

15 IPIRKliD glycoprotein 4e-10 

IPIRKliD skeletal muscle 7e-ll 

IPIRKliD calcium binding 7e-D^ 

IPIRKliD alternative splicing 3e-13 

IPIRKliD DNA condensation 2e-0fl 

20 IPIRKliD coiled coil Se-lb 

CPIRKliD P-loop Se-lb 

CPIRKIjO heptad repeat 3e-13 

IPIRKliD methylated amino acid Ee-13 

IPIRKliD basement membrane le-OT 

25 IPIRKliD immunoglobulin receptor 2e-CH 

IPIRKliD peripheral membrane protein 3e-lD 

IPIRKliD cardiac muscle Ee-11 

IPIRKliD extracellular matrix le-CH 

IPIRKliD hydrolase Se-lb * 

30 IPIRKliD microtubule le-11 

IPIRKliD muscle le-CH 

IPIRKliD membrane protein le-OT 

IPIRKliD EF hand 7e-DT 

IPIRKliD protein biosynthesis Me-CH 

35 IPIRKliD cytoskeleton 3e-13 

IPIRKliD hair 76-01 

IPIRKliD Golgi apparatus le-11 

IPIRKbD calmodulin binding 3e-10 

ISUPFAfD myosin heavy chain Se-lb 

40 ISUPFAilD conserved hypothetical PUS protein 4e-10 

ISUPFAfD IgA Fc receptor 7e-DT 

ISUPFAfD centromere protein E 3e-15 

ISUPFAfD unassigned Ser/Thr or Tyr-specific protein kinases 5e- 
10 

45 ISUPFAfD calmodulin repeat homology 7e-DT 

ISUPFAfD myosin motor domain homology Se-lb 

ISUPFAfD alpha-actinin act in-binding domain homology Se-10 

ISUPFAfD hypothetical protein MJO^m 4e-0fl 

ISUPFAfD tropomyosin be-CH 

50 ISUPFAfD plectin Se-10 

ISUPFAfD trichohyalin 7e-CH 

ISUPFAfD pleckstrin repeat homology le-Ofl 

ISUPFAfD ribosomal protein SID homology Se-10 

ISUPFAfD giantin 4e-13 

55 ISUPFAfD protein kinase homology Se-10 

ISUPFAfD protein kinase C zinc-binding repeat homology le-Dfl 

ISUPFAfD kinesin motor domain homology 3e-lS 

ISUPFAfD human early endosome antigen 1 3e-lD 
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ESUPFAfO myosin MYOE fie-Dfl 

ISUPFAI1J unassigned kinesin-related proteins le-10 

ESUPFAPO MS protein 3e-10 

ESUPFAPO cytoskeletal keratin 4e-D7 

EKIiO All_Alpha 

EKIiO L0U_C0I1PLEXITY 3-3D V. 

EKIO C0ILED_C0IL EH-lfl 'A 



10 SEfl MTSFflEVPLflTSNFAHVIFflNVAKSYLPNAHLECHYTLTPYIHPHPKDUVGIFKVGIilSTA 

SEG 

PRD ccceeeeeccccceeeeeccccccccccccceeeeeccccccccccccceeeeeeecccc 
COILS 



15 

SEC RDYYTFLUSPnPEHYVEGSTVNCVLAFflGYYLPNDDGEFYflFCYVTHKGEIRGASTPFflF 

SEG 

PRD eeeeeeeecccccccccccccceeeecccccccccccceeeeeeeccccccccccccccc 
COILS 



SEC RASSPVEELLTMEDEGNS]>riLVVTTKAGLLELKIEKTf1KEKEELLKLIAVLEKETA(3LRE 

SEG • xxxxxxxxxxxxxxxxxxxxxxxxxx 

PRD hhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhh 
25 COILS 

CCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCC 

SEfl flVGRMERELNHEKERCDflLflAEflKGLTEVTflSLKMENEEFKKRFSDATSKAHflLEEDIVS 

SEG - 

30 PRD hhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhh 
COILS 

CCCCCCCCCCCCCCCCCCC CCCCCCCC 



SEfl VTHKAIEKETELDSLKDKLKKAflHEREflLECflLKTEKDEKELYKVHLKNTEIENTKLMSE 

35 SEG 

PRD hhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhh 
COILS 

CCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCC 

40 SEfl VflTLKNLDGNKESVITHFKEEIGRLflLCLAEKENLflRTFLLTTSSKEDTCFLKEflLRKAE 

SEG 

PRD hhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhh 
COILS 

CCCCCCCCCCC 

45 

SEfl EflVflATRflEVVFLAKELSDAVNVRDRTnADLHTARLENEKVKKflLADAVAELKLNAPIKKD 

SEG - 

PRD hhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhh 
COILS 

50 CCCCCCCCCCCCCCCCCCCCCCC 



SEfl ADKTDTLEHELRREVEDLKLRLflllAADHYKEKFKECflRLflKfllNKLSDflSANNNNVFTKK 

SEG 

PRD hhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhn 
55 COILS 



SEfl TGNflflKVNDASVNTDPATSASTVDVKPSPSAAEADFDIVTKGflVCEMTKEIADKTEKYNK 
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SEG 

PRD hhhhhhhhhhhhhchhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhh 
COILS 



SE<3 CKflLLflDEKAKCNKYADELAKIIELKlilKEflViaAENVKLELAEVfiDNYKELKRSLENPAER 

SEG 

PRD hhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhh 
COILS 

10 CCCCCCCCCCCCCCCCCCCCCCCCCCCC 

SEA KHEGdNS(2SP<2CFICTCSE<2NGYVLTLSNAt2PVLl2YGNPYAS<2ETRDGADGAFYPDEI<2RP 

SEG 

PRD hhhhhhcccchhhhhhhhhhheeeeccccceeeecccccccccccccccccccccccccc 
15 COILS 



SE(2 PVRVPSUGLEDNVVCSflPARNFSRPDGLEDSEDSKEDENVPTAPDPPSflHLRGHGTGFCF 

SEG 

20 PRD ccccccccccceeeeccccccccccccccccccccccccccccccccccccccccccccc 
COILS 



SE<2 DSSFDVHKKCPLCELnFPPNYD(3SKFEEHVESHUKVCPriCSEflFPPDYD(J(2VFERHV(2TH 

25 SEG 

PRD ccccccccccccccccccccccchhhhhhhhhhhhccccccccccccchhhhhhhhhhhh 
COILS 



30 SE<2 FDflNVLNFD 

SEG 

PRD hcceeeccc 
COILS 



35 

(No Prosite data available for DKFZphamy2_2bl c J . E) 
(No Pfam data available for DKFZphamy2_Ebn • S ) 
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5 group: metabolism 

DKFZphamy2_2c22 encodes a novel 3b4 amino acid protein with 
similarity to the 1-acy l-glycerol-3-phosphate acyltransf erase of 
Zea mais- 

10 

It contains one leucine zipper- The protein is belived to play a 
role in fatty acid metabolism- It is ubiqitous expressed-! with a 
slight predominance in uterusi placenta and foreskin. 

15 The new protein can find application in modulation of fatty acid 
metabolism and as a new enzyme for bi'otechnological production 
processes. 



20 weak similarity to l-acyl-glycerol-3-phosphate acyltransf erase 
(Zea 
mais) 

perhaps complete cds- 

25 

Sequenced by PlediGenomix 

Locus: /map="fl" 

30 Insert length: 3403 bp 

Poly A stretch at pos. 3373i polyadenylation signal at pos. 33S1 



1 AGATGCTGCT GTCCCTGGTG CTCCACACGT ACTCCATGCG CTACCTGCTG 

35 51 CCCAGCGTCG TGCTCCTGGG CACGGCGCCC ACCTACGTGT TGGCCTGGGG 

101 GGTCTGGCGG CTGCTCTCCG CCTTCCTGCC CGCCCGCTTC TACCAAGCGC 

1S1 TGGACGACCG GCTCTACTGC GTCTACCAGA GCATGGTGCT CTTCTTCTTC 

201 GAGAATTACA CCGGGGTCCA GATATTGCTA TATGGAGATT TGCCAAAAAA 

251 TAAAGAAAAT ATAATATATT TAGCAAATCA TCAAAGCACA GTTGACTGGA 

40 301 TTGTTGCTGA CATCTTGGCC ATCAGGCAGA ATGCGCTAGG ACATGTGCGC 

351 TACGTGCTGA AAGAAGGGTT AAAATGGCTG CCATTGTATG GGTGTTACTT 

401 TGCTCAGCAT GGAGGAATCT ATGTAAAGCG CAGTGCCAAA TTTAACGAGA 

451 AAGAGATGCG AAACAAGTTG CAGAGCTACG TGGACGCAGG AACTCCAATG 

SOI TATCTTGTGA TTTTTCCAGA AGGTACAAGG TATAATCCAG AGCAAACAAA 

45 551 AGTCCTTTCA GCTAGTCAGG CATTTGCTGC CCAACGTGGC CTTGCAGTAT 

fe.01 TAAAACATGT GCTAACACCA CGAATAAAGG CAACTCACGT TGCTTTTGAT 

bSl TGCATGAAGA ATTATTTAGA TGCAATTTAT GATGTTACGG TGGTTTATGA 

701 AGGGAAAGAC GATGGAGGGC AGCGAAGAGA GTCACCGACC ATGACGGAAT 

751 TTCTCTGCAA AGAATGTCCA AAAATTCATA TTCACATTGA TCGTATCGAC 

50 SOI AAAAAAGATG TCCCAGAAGA ACAAGAACAT ATGAGAAGAT GGCTGCATGA 

851 ACGTTTCGAA ATCAAAGATA AGATGCTTAT AGAATTTTAT GAGTCACCAG 

=101 ATCCAGAAAG AAGAAAAAGA TTTCCTGGGA AAAGTGTTAA TTCCAAATTA 

151 AGTATCAAGA AGACTTTACC ATCAATGTTG ATCTTAAGTG GTTTGACTGC 

1D01 AGGCATGCTT ATGACCGATG CTGGAAGGAA GCTGTATGTG AACACCTGGA 

55 1051 TATATGGAAC CCTACTTGGC TGCCTGTGGG TTACTATTAA AGCATAGACA 

1101 AGTAGCTGTC TCCAGACAGT GGGATGTGCT ACATTGTCTA TTTTTGGCGG 

1151 CTGCACATGA CATCAAATTG TTTCCTGAAT TTATTAAGGA GTGTAAATAA 

1201 AGCCTTGTTG ATTGAAGATT GGATAATAGA ATTTGTGACG AAAGCTGATA 
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1251 TGCAATGGTC TTGGGCAAAC ATACCTGGTT GTACAACTTT AGCATCGGGG 

1301 CTGCTGGAAG GGTAAAAGCT AAATGGAGTT TCTCCTGCTC TGTCCATTTC 

1351 CTATGAACTA ATGACAACTT GAGAAGGCTG GGAGGATTGT GTATTTTGCA 

mOl AGTCAGATGG CTGCATTTTT GAGCATTAAT TTGCAGCGTA TTTCACTTTT 

5 1451 TCTGTTATTT TCAATTTATT ACAACTTGAC AGCTCCAAGC TCTTATTACT 

1SD1 AAAGTATTTA GTATCTTGCA GCTAGTTAAT ATTTCATCTT TTGCTTATTT 

1551 CTACAAGTCA GTGAAATAAA TTGTATTTAG GAAGTGTCAG GATGTTCAAA 

lbOl GGAAA6G6TA AAAA6TGTTC ATGGGGAAAA AGCTCTGTTT AGCACATGAT 

lb51 TTTATTGTAT TGCGTTATTA GCTGATTTTA CTCATTTTAT ATTTGCAAAA 

10 17D1 TAAATTTCTA ATATTTATTG AAATTGCTTA ATTTGCACAC CCTGTACACA 

1751 CAGAAAATGG TATAAAATAT GAGAACGAAG TTTAAAATTG TGACTCTGAT 

IflOl TCATTATAGC AGAACTTTAA ATTTCCCAGC TTTTTGAAGA TTTAAGCTAC 

IflSl GCTATTAGTA CTTCCCTTTG TCTGTGCCAT AAGTGCTTGA AAAC6TTAAG 

1101 GTTTTCTGTT TTGTTTTGTT TTTTTAATAT CAAAAGAGTC GGTGTGAACC 

15 n51 TTGGTTGGAC CCCAAGTTCA CAAGATTTTT AAGGTGATGA GAGCCTGCAG 

2D01 ACATTCTGCC TAGATTTACT AGCGTGTGCC TTTTGCCTGC TTCTCTTTGA 

2DS1 TTTCACAGAA TATTCATTCA GAAGTCGCGT TTCTGTAGTG TGGTGGATTC 

2101 CCACTGGGCT CTGGTCCTTC CCTTGGATCC CGTCAGTGGT GCTGCTCAGC 

2151 GGCTTGCACG CAGACTTGCT AGGAAGAAAT GCAGAGCCAG CCTGTGCTGC 

20 2201 CCACTTTCAG AGTTGAACTC TTTAAGCCCT TGTGAGTGGG CTTCACCAGC 

2251 TACTGCAGAG GCATTTTGCA TTTGTCTGTG TCAAGAAGTT CACCTTCTCA 

2301 AGCCAGTGAA ATACAGACTT AATTTGTCAT GACTGAACGA ATTTGTTTAT 

2351 TTCCCATTAG GTTTAGTGGA GCTACACATT AATATGTATC GCCTTAGAGC 

24Q1 AAGAGCTGTG TTCCAGGAAC CAGATCACGA TTTTTAGCCA TGGAACAATA 

25 2151 TATCCCATGG GAGAAGACCT TTCAGTGTGA ACTGTTCTAT TTTTGTGTTA 

2SD1 TAATTTAAAC TTCGATTTCC TCATAGTCCT TTAAGTTGAC ATTTCTGCTT 

2551 ACTGCTACTG GATTTTTGCT GCAGAAATAT ATCAGTGGCC CACATTAAAC 

2b01 ATACCAGTTG GATCATGATA AGCAAAATGA AAGAAATAAT GATTAAGGGA 

2b51 AAATTAAGTG ACTGTGTTAC ACTGCTTCTC CCATGCCAGA GAATAAACTC 

30 2701 TTTCAAGCAT CATCTTTGAA GAGTCGTGTG GTGTGAATTG GTTTGTGTAC 

2751 ATTAGAATGT ATGCACACAT CCATGGACAC TCAGGATATA GTTGGCCTAA 

2A01 TAATCGGGGC ATGGGTAAAA CTTATGAAAA TTTCCTCATG CTGAATTGTA 

2AS1 ATTTTCTCTT ACCTGTAAAG TAAAATTTAG ATCAATTCCA TGTCTTTGTT 

2101 AAGTACAGGG ATTTAATATA TTTTGAATAT AATGGGTATG TTCTAAATTT 

35 2151 GAACTTTGAG AGGCAATACT GTTGGAATTA TGTGGATTCT AACTCATTTT 

3DD1 AACAAGGTAG CCTGACCTGC ATAAGATCAC TTGAATGTTA GGTTTCATAG 

3051 AACTATACTA ATCTTCTCAC AAAAGGTCTA TAAAATACAG TCGTTGAAAA 

31D1 AAATTTTGTA TCAAAATGTT TGGAAAATTA GAAGCTTCTC CTTAACCTGT 

3151 ATTGATACTG ACTTGAATTA TTTTCTAAAA TTAAGAGCCG TATACCTACC 

40 3201 TGTAAGTCTT TTCACATATC ATTTAAACTT TTGTTTGTAT TATTACTGAT 

3251 TTACAGCTTA GTTATTAATT TTTCTTTATA AGAATGCCGT CGATGTGCAT 

3301 GCTTTTATGT TTTTCAGAAA AGGGTGTGTT TGGATGAAAG TAAAAAAAAA 

3351 AAATAAAATC TTTCACTGTC TCTAAAAAAA AAAGAAAAAA AAAAAAAAAA 
. 3H01 AAA 

45 

BLAST Results 



50 No BLAST result 



Medline entries 



55 

No Medline entry 



-171- 



WO 01/98454 



PCT/1B01/02050 



Peptide information for frame 3 



5 ORF from 3 bp to 1CH4 bp; peptide length: 3b4 
Category: similarity to known protein 
Classification: Metabolism 
Prosite motifs: LEUCINE_ZIPPER <1D5-I2b> 

10 

1 MLLSLVLHTY SMRYLLPSVV LLGTAPTYVL AUGVURLLSA FLPARFYfiAL 
SI DDRLYCVY<3S HVLFFFENYT GVfllLLYGDL PKNKENIIYL ANHCJSTVDUI 
1D1 VADILAIRfiN ALGHVRYVLK EGLKULPLYG CYFAflHGGIY VKRSAKFNEK 
151 EMRNKLflSYV DAGTPMYLVI FPEGTRYNPE (2TKVLSASCM FAAURGLAVL 
15 201 KHVLTPRIKA THVAFDCMKN YLDAIYDVTV VYEGKDDGGfl RRESPTMTEF 
251 LCKECPKIHI HIDRIDKKDV PEEflEHHRRU LHERFEIKDK NLIEFYESPD 
301 PERRKRFPGK SVNSKLSIKK TLPSMLILSG LTAGflLMTDA GRKLYVNTUI 
351 YGTLLGCLUV TIKA 

20 

BLASTP hits 

No BLASTP hits available 

25 

Alert BLASTP hits for DKFZphamy2_2c22-. frame 3 
No Alert BLASTP hits found 
30 Pedant information for DKFZphamy2_2c22-. frame 3 



35 



55 



Report for DKFZphamy2_2c22 • 3 



ELENGTH3 3b4 
EflliD 42D72-M7 
Cpll T-lfl 

CH0I10L3 TREI1BL:CEAF313b_l gene: "F2flB3.5 n ; Caenorhabditis 

40 elegans cosmid F2AB3 • 2e-3b 

CFUNCATJ =H unclassified proteins IS. cerevisiae-. YDRDlflcJ 

7e-13 

CFUNCATJ Ol-Dt-Ol lipid-, fatty-acid and sterol biosynthesis 
ES- cerevisiae-, YDL052cl Me-D5 
45 EFUNCATJ 30. ^ other cellular organization IS- cerevisiae-i 
YDLQ52c3 Me-OS 
EBL0CKS3 BLD12b3A 
EBLOCKSJ BPDma^A 

EPIRKbD transmembrane protein 2e-ll 

50 IESUPFAM3I probable membrane protein YBRDM2c 2e-ll 
EPR0SITE3 LEUCINE_ZIPPER 1 
EKU13 Alpha_Beta 
EKIiD L0U_C0HPLEXITY 3-57 '/. 



SE(J t1LLSLVLHTYSf1RYLLPSVVLLGTAPTYVLAUGVIiJRLLSAFLPARFY(2ALD])RLYCVYt!S 

SEG 

PRD ccchhhhhhhhhccccccceeecccceeeccchhhhhhhhhhhhhhhhhhhhhhhhhhhh 
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SEfl IIVLFFFENYTGVfllLLYGDLPKNKENIIYLANHOSTVDUIVADILAIRflNALGHVRYVLK 

SEC 

PRD hhhhhhhceeGeeeeeeccccccccceeeeecccchhhhhhhhhhhhhccccchhhhhhh 

5 

SEfl EGLKWLPLYGCYFAflHGGIYVKRSAKFNEKENRNKLflSYVDAGTPNYLVIFPEGTRYNPE 

SEG 

PRD hhhccccccceeeccceeeeeeccccccchhhhhhhhhhhccccceeeeeecccccchhh 

10 SEC flTKVLSASfiAFAACRGLAVLKHVLTPRIKATHVAFDCnKNYLDAIYDVTVVYEGKDDGGfl 

SEG 

PRD hhhhhhhhhhhhhhhcccccceeeecccchhhhhhhhhhhcceeeceeeeeecccccccc 

SEfl RRESPTMTEFLCKECPKIHIHIDRIDKKDVPEEflEHNRRldLHERFEIKDKIILIEFYESPD 

15 SEG xxxxxxxxxxxxx 

PRD cccccchhhhhccccceeeeeecccccccccccchhhhhhhhhhhhhhhhhhhhhhhccc 

SEfl PERRKRFPGKSVNSKLSIKKTLPSMLILSGLTAGHLIITDAGRKLYVNTIiIIYGTLLGCLUV 

SEG 

20 PRD cccccccccccchhhhhhhhchhhhhchhhhhhhhhhccccceeeeeeeechhhhhhhhh 

SEfl TIKA 

SEG . • . . 



25 



PRD hccc 



Prosite for DKFZphamyS_EcSS - 3 
30 PS0Q051 10S->127 LEUCINE_ZIPPER PDOCDDDS^ 

(No Pfam data available for DKFZphamyS_Sc22 • 3) 
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WO 01/98454 
DKFZphamy2_Eflfl 



PCT/IBO 1/02050 



5 group: signal transduction 

DKFZphamyE_Ef Ifi encodes a novel 215 amino acid protein with 
similarity to sodium channel protein betal of Rattus norvegicus- 

10 The sodium channel protein beta 1 of Rattus norvegicus is crucial 
in the assembly-i expression! and functional modulation of the 
heterotrimeric complex of the rat brain sodium channel- The 
expression of the new protein seems to be restricted to brainn 
all matching ESTs isolated so far-> derive from there- 

15 

The new protein can find application in modulating the sodium 
channel beta studying the expression profile in 
neurodegenerative diseases and of amygdala 
-specific genes- 



similarity to sodium channel protein betal (Rattus norvegicus) 
Pedant: SIGNAL_PEPTIDE 

25 

Sequenced by MediGenomix 



Locus: unknown 



30 Insert length: i»D5E bp 

Poly A stretch at pos- MD35-I no polyadenylation signal found 



1 CAGGGCTGAC 

35 51 ACCTGCCACC 

101 GAAGGGGCTC 
151 GAACGGGAGC 
201 CCGAGGGCGG 
B51 ACGGGCGGCC 

40 301 CAGTTAGGGC 
351 GAACGCAACC 
M01 CACAGCCTGG 
MSI ACCCTGGGCG 
501 CGGGGTGGGC 

45 551 GGGCGCGGAG 
L01 TCGCTTAGGG 
b51 CCCAGGCACC 
701 GGGCAGTTCG 
751 CCTTGACCGA 

50 601 AAGATGCCTG 
flSl CTACTGGGTC 
101 CGGAGGCCGT 
151 AAGAGAGAGG 
1001 CGAGGGCGGT 

55 1051 AGGTGGAGAG 
1101 CTGCAGGACG 
1151 CCTCTACACC 
1E01 CCTTTGTGAA 



AGCACACACG 
CAGGGTCGGG 
CCCCTCTACA 
TGCCGCTTCC 
GCGTGGACGG 
CCGGCGGCTT 
GGACGAAGCA 
GATCCTGGGG 
CTGCTAGGCC 
CAAACGAGCG 
GGGGAGGCGA 
CGGCTGATCA 
CCCAAAGCCC 
GGTGCTCGGC 
TCCCAAAGGG 
GGGAATCTCT 
CCTTCAATAG 
AGTGTCTGCT 
GCAGGGCAAC 
AGGTGGAGGC 
AAAGATTTCC 
CCCCTTTCAG 
TGTCCATCAC 
TGCAATGTGT 
GACGACGCGG 



GCCTGGGGGC 
GCCCCGCACC 
CCCACCCCCC 
TTCCCGGCCC 
GACCGACGTG 
CGGGAGTGGG 
GGAGCCGCGG 
AGGCGAGAGG 
AGCAGTGCGA 
AGGCAGGGGC 
CTGTCCGTGG 
GCTCCCTCGA 
CCGCCCGGCT 
CCTTCCTTCG 
TTTCCTCGAA 
CTGTGTAGCC 
ATTGTTTCCC 
TCCCTGTGTG 
CCCATGAAGC 
CACCACGGTG 
TTATTTACGA 
GGGCGCCTGC 
TGTGCTCAAC 
CCCGGGAGTT 
CTGATCCCCC 



CTAGAGAAGG 
ATCCGGGGGC 
AACCTCTGAC 
CGCTGCACCT 
GAACGCATTC 
GTCACGCCCA 
GGCTGGGAGG 
TGAATCAACC 
CTCCCTTCCG 
GCGAGTGGAA 
TGCTGAGCGC 
ACTGGGGAGG 
CCAAAAGCTC 
GTCAGAAAGT 
AGAATCTGAG 
TTGGAAGCCG 
CTGGCTTCTC 
TGTGGAAGTG 
TGCGCTGCAT 
GTGGAATGGT 
GTATCGGAAT 
AGTGGAATGG 
GTCACTCTGA 
TGAGTTTGAG 
TAAGAGTCAC 



ATTGCTGATC 
GAGCTCCCGG 
ATCGCCGGCC 
CCCCAGGGAG 
TGTAGCCCAG 
GCTGGAGAAG 
ATTCCAGTCG 
TGGACCCTTC 
AGCTGAGCTT 
GCTGGAGTTC 
CGGCGAGAGC 
TCCAGTGGGG 
CCAGG6CCTC 
CGCCCCCTGG 
AGGGCGCAGT 
CCAGCCCCAG 
TCGTGCTTAT 
CCCTCGGAGA 
CTCCTGCATG 
TCTACAGGCC 
GGCCACCAGG 
CAGCAAGGAC 
ACGACTCTGG 
GCGCATCGGC 
CGAGGAGGCT 
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1SS1 GGAGAGGACT TCACCTCTGT 

1301 GGTCTTCCTC ACCTTGTGGC 

1351 AGGTCTCAAA AGCCGAAGAG 

1401 GCCATCCCAT CTGAGAACAA 

5 1451 GAACAGGAGC AGTGTGACAT 

1501 ATCCCATGTT CAGCAATGTC 

1551 CATCGCTTCC CTTCATGCAT 

IbOl ATCCACCTGC CTCTGAGCTT 

lfc>Sl TCTACGCACC ATAAGACTCT 

10 1701 AGACTCAACC TCACCCTCTC 

1751 ACTGGATTTC TCCCCTGTGC 

1801 AGCACCTCCC TCTGCCCTTA 

1A51 TGCAAGAGAA TGGAAGTCTT 

nOl GTTAAAGCAA AAGTGTGTCA 

15 1=151 CAGGTTCATG GCCCACTTTG 

2001 AGGTTCTCCA GTGACAGAAA 

2051 GGAGCTTAGT ACTCCAGAGC 

2101 CAAGAAGACA AGGACAGGAG 

2151 GCTCTTAAAA AGTCATGCAA 

20 2201 GCGATAATGT ATGTGTGCCC 

2251 ACTGAAATTC CTGAGTTCTT 

2301 TCCTCATTTT ACATACAGGA 

2351 AATAAGAGGC TTAAGAGGAT 

2M01 GCCCAGTTTA CACTTCCTGG 

25 2451 TTTTTTTTGT TTGAGATGGA 

2501 GTGGTGTGAT CTCGGCTCAC 

2551 TCTCATGCCT CGGCCTCTCC 

2b01 CGCCTGGCTA AATTTTGTAT 

2b51 GCCAGGCTAG TCTTGAACTC 

30 2701 TCCCAAAGTG CTGAGATTAC 

2751 TTTGTTTCTG AAAAGACTGA 

2601 CACAAGCACG GACTGGGCTG 

2351 AAATTGGCCA AAAAAGCAGG 

2101 TCCTTTATGT AAAGATCTGT 

35 2=151 CATCTGAGAC TGATATTTAA 

3001 CATAAGTAAA TGAGCAGTGT 

3051 AATACTTCGC CTATGAATGC 

3101 TGAGGGAGCA GAGAAACTGG 

3151 TTTTAATTAA GTGACAGGTC 

40 3201 AAGAGAGGGG AAAGATGCTT 

3SS1 GATTCAGCGA GAGAGAGGTC 

33D1 AAGACCATAT TCCATAGGTT 

3351 CATTCTTCCA TCCCTAGGAA 

3401 TTTATTTATT ATTATTTTTT 

45 3451 TAGAGTGCAG TG6TGCAATC 

3501 TTTAAGCAAT CCTCCCACCT 

3551 GCACCACCAT GCCTGGCTAA 

3b01 TGCCATGTCG CCCAGGCTGA 

3bSl AGCCTCAGCC TCTCAAAGTG 

50 3701 GGCCCAAAAC CAGACCGTTA 

3751 AAATATTTGC AATAAATTCA 

38D1 ATCCCTCTGA AATAAGGGAG 

3851 GAAAAATTGG CCTCAGTGTG 

3=101 CCACCAGCTT GCGGTAGTAG 

55 3T51 GGTGGCCCAA TAGCTCGTGT 

4001 TGCCTTTTTC TCTTCTTTTT 
4051 AA 



GGTCTCAGAA ATCATGATGT ACATCCTTCT 
TGCTCATCGA GATGATATAT TGCTACAGAA 
GCAGCCCAAG AAAACGCGTC TGACTACCTT 
GGAGAACTCT GCGGTACCAG TGGAGGAATA 
GAGGTGGCCT GAACACCTGA GGGACTGGAC 
AATGGCATCA GGAGGGCGCC CCAAGGGCCC 
CCATTGTTCT GTTCATTCAT TCATCCATAC 
TCACCTCTGA CTCCCTAACT CCATCAGACC 
GCCAGAACTG AGAAGCCAAC ATTTCTACAT 
CTAGTTTTCC AACAAGACAC TCCAAAGCCA 
TCCAAATGAC TTTGTACAAG TGCTGGAGTT 
ACTGGCTGGA ACTGGTTCAT TCTCCATTAC 
AATAGAAGGA AGCAGGAGTG ATTAGTTCGG 
TGAACTTGGA TTCCCTGAAG TCAGTTTTGT 
CTACAGCATC AGAGTGAAGC ACGCCTGTCT 
GATCCTGAAG CATGGACTAA CATGCTCTCT 
TAGATCCTGA TGGGTCTCTA AGGTTCCCTC 
ACTTGGGAAG GACCAATGGT AATTTAAGTG 
CATGTTTCTG GACACGTTCC TGATCCTATT 
TCCCTGTGGG CACACCACCT GG6CATTAGG 
CCTCTCAAAA TTTCT6TGCA CCAGTATTAT 
GGCAACTAAG ACTCATACAG GGCTCAACTG 
AAACTGGAGC AGAAATAAGC CTTAGGTGCT 
GATGGATGTT TTTGTTTGTT TTGTTTTTTG 
GTCTCACTCT GTCACCTAGG CTAGAGTGCA 
TGCAACCTCT GCCTCTTGGG TTCAAGCAAT 
AGTAGCTGGG ATTACAGGTG TGCACCACCA 
TTTTAGTACA GACAGGGTTT GACTATGTTG 
CTGACCTCAA ATGACCCACC CACCTCAGCC 
AGGCGTGAGG CACTGCGCCC GGTGGATAAC 
CATTGAACTT GTCTATGGCA ATGCTTCTTT 
AGGTCAACTC TGATAGATTC AGATGACTAG 
GAGAAGAACA TGAGGTAGAC TTAAAGAACT 
GACTCTGAAA TATCCTCCAA AAGGAGAGTG 
ACTAAGAAAA ATGTTTAGTC TGAGATGGAT 
GAGAGGGGAG GGATGGGTAG GTGCTTT.CCA 
ATAATTTTCA GATTTTTTTC CCCTAGATTT 
AAAAAACTTT AGTCAATATC TCGTGTTTCA 
CAAGTGTGAC ATCCTTCAGC ACCCAGGGAC 
TATGGAATGT AAGAAGATGA AGGTGACTGG 
CCTCAGACCT GGGACCTCCC TTTATAGGGA 
TAGGGCTTTA CCTTAAAAGC TCATTTTTTT 
AGTACTTAAA ACCAGACTTT TAAATTTTTA 
TGAGACAGAT TCTCACTCTG TCTCCCAGGC 
TCAGCTCACT GCAGCCTCAA CTGCCCCAGG 
CAGCCCCCAG GTAACTGGGA CTACAGGCAT 
TTTTTGTATT TTATGTAGAG ACAGGGGTCT 
TCTTGAACTC CTGGGCTCAA GCAATCTGCC 
CTGGGATTAC AGGCCTGAGC AACTGTGCCT 
ACACATTAAA GAGTCTGATT TTGTTGAAGA 
AGACTCTTCT TATTGGTAAT TTTCCACACA 
AGGATATAGA CCTTTTTAAC TTTATAGTTA 
AAATTTTTCC AGTCCCATAG CTCATGGATG 
CAAGATGCTT ACTACCACAC CGTTTTCCTC 
ATCTAAGTTG AACCCGGCAG TATGCATGAT 
AAAAAAACCC AACTCAAAAA AAAAAAAAAA 
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BLAST Results 

5 No BLAST result 

Medline entries 



10 

12271507 : 

Isom LLi De Jongh KSi Patton DEi Reber BFn Offord Ji Charbonneau 

Walsh K-, Goldin AL i Catterall 
15 UA-i Primary structure and functional expression of the beta 1 
subunit 
of 

the rat brain sodium channel- Science 1112 Hay ai25b<50Sfl) :831-M2 
20 1fa2351Sl: 

Belcher SH-. Howe JR.} Cloning of the cDNA encoding the sodium 
channel 

beta 1 subunit from rabbit. Gene 111b day ail70<2) :SflS-b 
25 1335771b: 

flcClatchey AIi Cannon SCi Slaugenhaupt SA-i Gusella JF.i The 
cloning and 

expression of a sodium channel beta 

1-subunit cDNA from human brain. Hum Mol Genet 1113 Juni2(b) :7M5- 
30 1 



35 Peptide information for frame 3 



ORF from fiDM bp to mia bp\ peptide length: SIS 
Category: similarity to known protein 
40 Classification: Transmembrane proteins unclassified 



1 NPAFNRLFPL ASLVLIYUVS VCFPVCVEVP SETEAVflGNP MKLRCISdlK 

51 REEVEATTVV EUFYRPEGGK DFLIYEYRNG H(JEVESPF(2G RLfiUNGSKDL 

101 (2DVSITVLNV TLNDSGLYTC NVSREFEFEA HRPFVKTTRL IPLRVTEEAG 

45 151 EDFTSVVSEI MMYILLVFLT LULLIEMIYC YRKVSKAEEA AflENASDYLA 

EDI IPSENKENSA VPVEE 



50 BLASTP hits 

No BLASTP hits available 

Alert BLASTP hits for DKFZphamy2_2f Ifli frame 3 

55 

PIR:JCM7flfi sodium channel protein betal chain - rabbit, N = li 
Score - 

13M-, P = fl.3e-Hl 
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PIR:AS573M sodium channels voltage-gatedi beta-1 chain precursor 
humani N = It Score = MSfln P - 3-be-MO 

5 

PIR:AH273? sodium channel beta 1 subunit - rati N = l-i Score = 

42T-. P = 

E.fie-MD 

10 

>PIR:JC l l7afi sodium channel protein betal chain - rabbit 
Length "= Hlfl 

HSPs: 

15 

Score = M3M (bS-1 bits)t Expect = fl.3e-mi P = fl.3e-m 
Identities = 100/214 (Mb*)-. Positives = 121/214 (bD*> 

fluery: ID 

20 LASLVLIYWVSVCFPVCVEVPSETEAVflGNPMKLRCISCMKREEVEATTVVEWFYRPEGG bl 

LA +V VS + CVEV SETEAV G K+ CISC +R E AT 

Elil +R +G 
Sbjct: 5 

LAFVVGAALVSSAUGGCVEVDSETEAVYGMTFKILCISCKRRSETTAETFTEIilTFRtfKGT b4 

25 

fluery: 7D KDFL-IYEYRNGH<2EVESP--Ft2GRL<2UNGS 

KDLflDVSITVLNVTLNDSGLYTCNVS 123 

++F+ I Y N ++E F+GR+ UNGS KDLUD+SI + NVT N 

SG Y C+V 
30 Sbjct: bS 

EEFVKILRYENEVL(2LEEDERFEGRVVIiJNGSRGTKDL(3DLSIFITNVTYNHSGI>Yi3CHVY 124 
<2uery: 124 

REFEFEAHRPFVKTTRLIPLRVTEEAGEDFTSVVSEH1HYIXXXXXXXXXXIEHIYCYRK lfl3 
35 R FE + + I L V ++A D S+VSEIMMY+ 

EM+YCY+K 
Sbjct: 12S 

RLLSFENYEHNTSVVKKIHLE WDKANRDnASIVSEIMflYVLIVVLTIULVAEIIVYCYKK lfl4 

40 fluery: 184 VSKAEEAA-flENASDYLAIPSENKEN-SAVPVEE SIS 

++ A EAA (2ENAS+YLAI SE+KEN + V V E 
Sbjct: IfiS IAAATEAAAflENASEYLAITSESKENCTGVdVAE 21fl 



45 Pedant information for DKFZphamy2_2f lfl-> frame 3 



Report for DKFZphamy2_2f lfl . 3 

50 

CLENGTH3 215 
EllliO 247Q2-40 
Epll 4-bl 

EHOMOLJ PIR:JCM7fia sodium channel protein betal chain - 

55 rabbit 3e-41 

EBLOCKSJ BL00401D Prokaryotic sulf ate-binding proteins 
CBLOCKSl BPDDS7D 
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10 



15 



ESC0P3 dlneu E-l-l-l-l Myelin membrane adhesion 

molecule P0 Era Ee-i+3 

EPIRKUJ Schwann cell Ee-07 

EPIRKUJ transmembrane protein le-MO 

EPIRKUJ myelin Ee-07 

EPIRKUJ phosphoprotein 5e-D7 

EPIRKUJ glycoprotein le-40 

EPIRKUJ structural protein Ee-07 

EPIRKUJ muscle le-HO 

EPIRKUJ membrane protein Se-07 

ESUPFAMJ immunoglobulin homology Ee-07 

ESUPFAMJ myelin P0 protein Ee-07 

EPFAMJ IG (immunoglobulin) superfamily 

EKUJ All_Beta 

EKUJ 3D 

EKUJ SIGNAL_PEPTI»E S3 

EKUJ L0U_C0MPLEXITY M-bS X 



20 SE<3 MPAFNRLFPLASLVLIYUVSVCFPVCVEVPSETEAVflGNPMKLRCISCMKREEVEATTVV 

SEG 

Ineu- CEEEECCEEEETTTbCEEECE- 

EEECCCCCCCCCEE 

25 SE(3 ElilFYRPEGGKDFLIYEYRNGHflEVESPFflGRLfllilNGSKDLlJDVSITVLNVTLNDSGLYTC 

SEG 

Ineu- 

EEEEEETTTCCCEEEEEETTEEEETTTTTTTEEECCBGGGCBCCEEECCbTTTTTEEEEE 

30 SE<2 NVSREFEFEAHRPFVKTTRLIPLRVTEEAGEDFTSVVSEIMMYILLVFLTLULLIEMIYC 

SEG xxxxxxxxxx 

lneu- 

EE 

35 SEfl YRKVSKAEEAAflENASDYLAIPSENKENSAVPVEE 

SEG 

Ineu- 



40 (No Prosite data available for DKFZphamy£_Ef lfl-3) 



Pfam for »KFZphamy£_Ef Ifl -3 

45 

HMM_NAME IG (immunoglobulin) superfamily 
HUM 

*yrNgqpipssegyUytRweqqgRYsisif qLtlisUepeDsGtYUCmV* 
50 YRNG ++ E+ ++ R++++G ++ +++T+ +++ +DSG 

Y+C+V 

fluery 77 YRNGHflEV— 

ESPFdGRLflUNGSKDLflDVSITVLNVTLNDSGLYTCNV 1EE 

55 
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5 group: nucleic acid management 

DKFZphamy2_2f 25 encodes a novel M7T amino acid protein with 
similarity to YDL153c of Saccharomyces cerevisia- 

10 The novel protein is ubiquitously expressed. YDL153c is involved 
in transcriptional silencing. 

The new protein can find application in modulation of 
transcription! e.g. transcriptional silencing. 

15 

putative protein 

probably complete cds. 
20 perhaps differential polyadenylation 

YDLlSBc is involved in transcriptional silencing 

Sequenced by MediGenomix 

25 Locus: /map="M" 

Insert length: 2011 bp 

Poly A stretch at pos- 2000 1 polyadenylation signal at pos- lTfll 

30 

1 GGAGTCTGCA AACTCCGGTG GTAGGGGAGC 

51 GAGTTACCGG AGCGCCTGAT TCCTGCGCCG 

101 TCCGGAGTCG CTGTAAAACC TGAGATTGTG 

151 GGCGGCGCGG AGCAGCTAAG TGGGCAGCTG 

35 201 ACGCTCACCG ACGAAAATGG AGATGATTTA 

251 GGACACCAGC TACTACCAAG ATCAGGTAGA 

301 CCCGGGCCGC CTTAGCTAAG GGCTGGAATG 

351 GAGGATGGCG AGGAGGAGGA GGAGGAGGTG 

101 TGAGGACGAC GAAGATGGAG GGAATGCGGG 

40 HB1 ATGCCGATGA TGATGGTGGG AGCTCCGTGC 

501 GTGGATCCCA GTTTGTCGTG GGGTCAGAGG 

551 GGACTATGGT TCCAAGTCCC GAGGCCGGCA 

bOl AGGAGGAAAG AGAGGAGGAG GAGGAGGCAC 

bSl GCCCAAGCGC TGCAAGAGGA TGATTTTGGT 

45 701 TGCAAAACCA GTGCCTCAGG TAGATGAGGC 

751 ATTTGGCTAA AGTTTCAGTG AAAGAGAAGC 

fiOl TCACCAGAAC TCTTGGAGCT GATAGAAGAC 

651 GGTTAAGGAT GAGCTGGAGC CATTGTTAGA 

=101 TTCCACCCGG AAAAGGAAGC CAATACTTGA 

50 151 TTGAATTATT GCTCGAACAT CAGTTTTTAT 

1001 AGTCCCAGCA CATGGACATC CTGTCATAGA 

1051 ATTTGATCAA CAAGCTGTCC GTTGTGGATC 

1101 CGTCATCTGT TGACACTTAA GGATGATGCT 

1151 AAAAGCAAAA TCCACCAAGC CCAAACCAAA 

55 1201 CTGCTGCCTG TGCTGTTACA GATCTTTCTG 

1251 AAAGCAAAAC TGAAGTACTA TAAAGAAATA 

1301 GAGAAAGAAA GAAGAAAATA GCACTGAAGA 

1351 ATGCAAAGAG AGCTATTACC TATCAAATTG 



GCGCTGCTGT TTAGAGCCAC 
AAGTCAGTGG TGGCCGAAAG 
AGCCATGGTG GGGAGATCCC 
TGCGAGCCAA GGCAGGTCCC 
GGATTGCCAC CCTCACCAGG 
TGACTTTCAT GAGGCACGAT 
AAGTACAGAG TGGAGACGAG 
CTAGCCCTAG ATATGGACGA 
GGAGGAGGAG GAGGAGGAGA 
AAAGTGAAGC TGAGGCCTCT 
AAAAAACTTT ACTATGACAC 
GAGTCAACAG GAGGCAGAGG 
AGATCATTCA GCGGCGCCTA 
GTCGCCTGGG TTGAGGCCTT 
TGAGACACGG GTCGTGAAGG 
TGAAAATGTT GCGAAAGGAA 
CTGAAAGTCA AGTTGACAGA 
GTTGGTGGAA CAA6GGATCA 
6GACCAAGTA CAACCTCTAC 
TTGATCCTGA AAGCTAGGAG 
AAGGCTTGTT ACCTACCGAA 
AGAAGCTGTC CTCAGAAATT 
GTAAAGAAAG AACTGATTCC 
GTCTGTTTCA AAGACTTCTG 
ATGATTCTGA TTTTGATGAA 
GAAGACAGGC AAAAGCTAAA 
ACAGGCTCTT GAAGATCAAA 
CTAAAAATAG GGGACTTACT 
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1101 CCTAGGAGAA AGA AGATTGA TCGCAATCCC AGAGTGAAAC ACAGAGAGAA 

1451 GTTCAGAAGA GCCAAAATTA GAAGAAGAGG CCAGGTTCGT GAAGTTCGTA 

15D1 AAGAAGAGCA ACGTTATAGT GGTGAATTAT CTGGCATTCG TGCAGGAGTT 

1551 AAAAAGAGCA TTAAGCTTAA ATGAAGTTTT TGCTTAGCAT AAGGTTTTTG 

5 lbOl GCAGTTTTGG ATCAATAAAT TTTTACTTTT AACTAAAGTC ATTGTATTAA 

lbSl TATATAATAC TTTAAATTTT AAAAATTCTT GTCCACAAGG AAATTTGTCT 

1701 GGGTTATTGG ACAATTTATA AGAACTATGG GAGCAATATG AAGGTGCTTG 

1751 AGAAAAGAGA TGATGTTGAA GTTTTCCAAT ATTCTGTTGA AGTTTTCCAA 

IflOl TATTAAGTAT TAGCTTAGGG AAATTTCACA GTTCATTGTG GAGTGTTAAA 

10 IflSl CTTAGAACAT GTGTAACTTT TCACATAAAG AGAATGCATC TTTGACAGTT 

MD1 ATCTTATTTG TAAGGCAGCC TATAAAATAG TTCTGAAGTA TTTTATTTAC 

1=151 CTAACTATAA TTATTGGGCC AGATACTTGT TAATAAATGG GCTTAATGTC 
2DD1 AAAAAAA AAA AAAAAAAAA 

15 

BLAST Results 



No BLAST result 

20 

Medline entries 



25 No Medline entry 



Peptide information for frame 3 
30 

ORF from 135 bp to 1571 bpi peptide length: M71 
Category: similarity to unknown protein 
Classification: Nucleic acid management 



35 

1 MVGRSRRRGA AKUAAVRAKA 

51 FHEARSRAAL AKGUNEVflSG 

1D1 EEEEENADDD GGSSVcJSEAE 

151 fldJEAEEEERE EEEEAfllllJR 

40 EDI TRVVKDLAKV SVKEKLKMLR 

E51 VEflGIIPPGK GSUYLRTKYN 

301 LVTYRNLINK LSVVDfiKLSS 

351 VSKTSAAACA VTDLSDDSDF 

401 ALEDiJNAKRA ITYfllAKNRG 

45 451 VREVRKEEdR YSGELSGIRA 



GPTLTDENGD DLGLPPSPGD TSYYtJDflVDD 
DEEDGEEEEE EVLALDMDDE DDEDGGNAGE 
ASVDPSLSUG fiRKKLYYDTD YGSKSRGRflS 
RLA(3AL(3EDD FGVAUVEAFA KPVPtJVDEAE 
KESPELLELI EDLKVKLTEV KDELEPLLEL 
LYLNYCSNIS FYLILKARRV PAHGHPVIER 
EIRHLLTLKD DAVKKELIP< AKSTKPKPKS 
DEKAKLKYYK EIEDRflKLKR KKEENSTEEd 
LTPRRKKIDR NPRVKHREKF RRAKIRRRGfl 
GVKKSIKLK 



50 

No BLASTP hits available 

Alert BLASTP h 

55 PIR:Sb7701 hypothetical pro 
cerevisiae) -i N = 4t Score = 



BLASTP hits 

ts for DKFZphamy2_2f EE i frame 3 

ein YDL153c - yeast (Saccharomyces 
134-. P = l.fle-Ofi 



-180- 



10 



15 



20 



55 



WO 01/98454 PCT/IB01/02050 

PIR : TDflblM hypothetical protein DKFZpSb40D12 • 1 - human 

(fragment)-i N = 

li Score = IMli P = 5-fle-Q7 

TREHBL : SPBC3Bfl_1 gene: "SPBC3BA . 01"i product: "hypothetical 
protein"i 

S.pombe chromosome II cosmid c3Bfl«T N = 2i Score = IbMi P = b-2e- 
13 



>TREI1BL:SPBC3Bfl_1 gene: "SPBC3B6.D1"i product: "hypothetical 
protein", 

S-pombe chromosome II cosmid c3Bfi- 
Length = 517 

HSPs: 

Score = IbM (24. b bits)! Expect = b-2e-13-, Sum P(2) = b.2e-13 
Identities = m/12b (3MX) ■> Positives = bfl/12b (53*) 



fluery: 3b7 DSDFDEKAKLKYYKEIEDRflKLKRK-KEEN STEEdALE- 

DflNAKRAITYU Mm 

D + +++ L YY+ ++ + K+ +K ++EN S + +E + 

KR IT 
25 Sbjct: i*72 

DREVEDflDDLDYYESLDKKSKMAKKLRKENHDLERDLIRASRHPELIELGEGDKRCITLD 531 

(Juery: MIS IAKNRGLTPRRKKIDRNPRVKHXXXXXXXXXXXXGflVREVRKEEfiR- 
YSGELSGIRAGVK M73 

30 IAKNRGLTPRR K +RNPR+K + + (2 Y+GE 

+GI+AG+ 
Sbjct: 532 

IAKNRGLTPRRPKENRNPRLKKRHRYEKAKKKLASKKAIYKGAPflGGYAGEflTGIKAGLV 511 

35 fluery: 47H KSIKLK 471 

KSIKL+ 
Sbjct: 512 KSIKLfl 517 

Score = SO (12. □ bits)-. Expect = b-2e-13-. Sum P(2) = b-2e-13 
40 Identities = 21/1E1 (2EX)-. Positives = bb/lEI (51*) 

fluery: 117 DEAETRVVK-DLAKVSVKEKLKMLRKESP — ELLELIE 

DLKVKLTEVKDELEPLLE E41 

D ++ + +K D + +++E ++ + + P ELL+++E + ++ L E+ 

45 ++L+P L 

Sbjct: 173 DNSDLKSIKflDSSAAAIEELVfifllSPDLPRTELLKILEAKHPEFtfLFLDEL- 
N(2LKP<2LN E31 

fluery: E5D LVEdGIIPPGKGSOYLRTKYNLYLNYCSNISFYL- 
50 ILKARRVPAHGHPVIERLVTYRNLI 3Dfi 

+++ + Sfl L+ + Y S ++FY +LK HP++ 

LV + 

Sbjct: E32 EIKEKL- 

KTYPSS(3LL(JAC2CTALSTYISFLTFYFALL<DGEEI>LKNHPH1VDLVRCK(3T1J 210 



fluery: 301 NKLSVVDtJKLS 311 

+D+ L+ 

Sbjct: 211 ESYCGLDEVLT 301 
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Score = (fl-T bits)-. Expect « T-Ee-lli Sum P(2) = T-Ee-ll 
Identities = lfi/ST (30*)-. Positives = 35/51 (SIX) 

5 Query: lib VDEAETRVVKDLAKVSVKEKLKMLRKESPEL 

LELIEDLKVKLTEVKDELE — PLLEL 250 

++E ++ DL + E LK+L + PE L+ + LK +L 
E+K++L+ P +L 

Sbjct: Ifll IEELVfifilSPDLPRT 

10 ELLKILEAKHPEF<2LFLDELNiJLKP<2LNEIKEKLKTYPSS<3L 245 

fluery: SSI VE 252 
++ 

Sbjct: 2Mb L<2 247 



15 



25 



35 



45 



Score = 57 (fi-t. bits)-. Expect = 3.0e-Dl-. Sum P(2) = 2-be-Ql 
Identities = 13/Sfl (22*)-, Positives = 2b/5fl (44X) 



fluery: 3L.7 DSDFDEKAKLKYYKEIEDRflKLKRK — 
20 KEENSTEE<2ALEDl2NAKRAITY(2IAKNRGLT 422 

J> + +++ L YY+ ++ + K+ +K KE + E + I 

RG+T 

Sbjct: 472 

DREVEDflDDLDYYESLDKKSKNAKKLRKENHDLERDLIRASRHPELIELGEGDKRGIT 521 



Score = 42 (b-3 bits)-! Expect = S.2e-DTi Sum P(2) = 5-2e-01 
Identities = 13/51 (25*)-, Positives = 21/51 (SbX) 



fiuery: m AETRVVKDLAKVSVKEKLKMLRKESPE — 
30 LLELIEDLKVKLTEVKDELEPLLE 241 

+ET + D+++ + LK ++++S + EL++ + L + EL 

+LE 

Sbjct: lbD SETDAIDDISflUADNSDLKSIKfJDSSAAAIEELVflCJISPDLP — 
RTELLKILE 210 



Score = 31 (5-1 bits)-. Expect = 1-le-Dfl-, Sum P(2) = 1-le-Dfl 
Identities = fl/lfl (44*) ■> Positives = 11/lfl (bl*) 



fluery: 43 YY<2D<2VDDFHEARSRAAL bO 
40 +Y +Q+J> RSRA L 

Sbjct: 140.2 FYAN<2ID(2KAAKRSRAVL 411 



Pedant information for DKFZphamy2_2f 22n frame 3 
Report for DKFZphamy2_2f 22-3 



50 ELENGTHJ 471 

EI1U3 545SS • DO 

Ipll 5-50 

EH0H0L3 TREMBL :SPBC3BS_1 gene: "SPBC.3BA - 01" i product: 

"hypothetical protein"} S-pombe chromosome II cosmid c3Bfl. Ie-1D 

55 EFUNCAT3 04 • D5 • 01 • D4 transcriptional control IS. cerevisiae-i 

YDL153c3 le-Ofl 
IBL0CKS3 PR0DS2a]> 

EBLOCKS J BLDD3b0C Ribosomal protein SI proteins 
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EBLOCKSJ 
[BLOCKS! 
EBL0CKS3 
[BLOCKS! 
proteins 
EKbD 

[KliO 



BLDDTtMA Syndecans proteins 

PRDObEMG 

PRDDAEAH 

BL0D6SMB Elongation factor 1 beta/beta ' /delta chain 



All_Alpha 

L0U_C0riPLEXITY 

C0ILEDC0IL 



21. fc.3 •/. 
7. ID '/. 



10 



15 



SE<2 MVGRSRRRGAAKUAAVRAKAGPTLTDENGDDLGLPPSPGDTSYYflDflVDDFHEARSRAAL 

SEG xxxxxxxxxxxxxxxx 

PR]> cccccchhhhhhhhhhhhhccccccccccccccccccccccccccchhhhhhhhhhhhhh 
COILS 



SEfl AKGldNEVtfSGDEEDGEEEEEEVLALDIIDDEDDEDGGNAGEEEEEENADDDGGSSVlJSEAE 

SEG xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx 

PRD hhcccccccccccchhhhhhhhhhhhhhccccccccchhhhhhhhhhccccccchhhhhh 
20 COILS 



SEfl ASVDPSLSUG(3RKKLYYDTDYGSKSRGR(2S£3<aEAEEEEREEEEEA(2II<2RRLA(2AL<2EDD 

SEG xxxxxxxxxxxxxxxxxxxxxxxxxxx 

25 PRD hcccccccccccceeeecccccccccccchhhhhhhhhhhhhhhhhhhhhhhhhhhhhcc 
COILS 



SE<2 FGVAUVEAFAKPVPflVDEAETRVVKDLAKVSVKEKLKNLRKESPELLELIEDLKVKLTEV 

30 SEG 

PRD chhhhhhhhhhccchhhhhhhhhhhhhhhhhhhhhhhhhhhcchhhhhhhhhhhhhhh 
COILS 

CCCCCCCCCCCCCCCCCCCCCCCCCCCCC 

35 SE<2 KDELEPLLELVE(2GIIPPGKGS<2YLRTKYNLYLNYCSNISFYLILKARRVPAHGHPVIER 

SEG 

PRD hhhhhhhhhhhhhhhcccccchhhhhhhhhhhhhhhhhhhhhhhhhhhcccccccccccc 
COILS 

ccccc 

40 

SE<3 LVTYRNLINKLSVVDfiKLSSEIRHLLTLKDDAVKKELIPKAKSTKPKPKSVSKTSAAACA 

SEG xxxxxxxxxxxxxxxxxxxx. • 

PRD hhhhhhhhhhhhhhhhhhhhhhhhhhhhhcchhhhhhhhhhhhcccccccchhhhhhhhh 
COILS 

45 

SE(3 VTDLSDDSDFDEKAKLKYYKEIEDRl3KLKRKKEENSTEE<3ALED<3NAKRAITY(2IAKNRG 

SEG 

PRD hhhhcccchhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhh 
50 COILS 



SE<2 LTPRRKKIDRNPRVKHREKFRRAKIRRRGflVREVRKEEflRYSGELSGIRAGVKKSIKLK 

SEG xxxxxxxxxxxx 

55 PRD cccccccccccchhhhhhhhhhhhhhhhhhhhhhhhhhhhhhcccccchhhhhhhhccc 
COILS 
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(No Prosite data available for DKFZphamy2_Sf22.3) 
(No Pfam data available for DKFZphamy2_2f22.3> 
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5 group: nucleic acid management 

DKFZphamy2_2gl2 encodes a novel 111 amino acid protein with 
similarity to NVL-2 of Rattus norvegicus- 

10 The novel protein contains 3 EF-hand calcium-binding domains. The 
related human VILIP Ca-dependend protein specifically binds the 
3'-untranslated region of the neurotrophin receptor-i trkBn an 
mRNA localized to hippocampal dendrites in an activity-dependent 
manner- The new protein exhibists elevated expression in brain 

15 and testis- 

The new protein can find application in studying the expression 
profile of brain-specific genes and as a new marker for neuronal 
cells • 

20 

strong similarity to NVL-2 (Rattus norvegicus) 

Comment for P35332: 
25 FUNCTION: MAY BE INVOLVED IN THE CALCIUM-DEPENDENT REGULATION OF 
RH0D0PSIN PHOSPHORYLATION • 

TISSUE SPECIFICITY: NEURON-SPECIFIC IN THE CENTRAL AND PERIPHERAL 
NERVOUS SYSTEM. 

MISCELLANEOUS: PROBABLY BINDS TUO OR THREE CALCIUM IONS (BY 
30 SIMILARITY) 

SIMILARITY: TO OTHER EF-HAND CALCIUM BINDING PROTEINS-. BELONGS TO 
THE 

RECOVERIN SUBFAMILY. 

35 Sequenced by MediGenomix 

Locus: /chromosome="l" 

Insert length: L »2fl5 bp 
40 Poly A stretch at pos- M25fii polyadenylation signal at pos. M217 



1 GGCGGCTCCG GCGCAGACCT TGGAGAGCAC AGCTGCCGGC CCGCGAGCCA 
51 GCCTCGGTTC CCGCGGCCCG CCGAGGCTCG GAGCCATCCA GCGACCCGGC 

45 1D1 GACCGGCCTC AGGCCCCGCC ATG6GGAAGA CCAACAGCAA GCTGGCCCCC 
151 GAGGTGCTGG AGGACCTTGT TCAGAACACT GAGTTCAGCG AGCAGGAGCT 
2D1 GAAGCAGTGG TACAAGGGCT TCCTGAAGGA CTGCCCCAGC GGCATCCTCA 
251 ACCTGGAGGA GTTTCAGCAG CTCTACATCA AGTTCTTCCC CTACGGCGAC 
301 GCCTCCAAGT TCGCGCAGCA CGCTTTCCGC ACCTTCGACA AGAACGGCGA 

50 351 CGGCACCATC GACTTCCGGG AGTTCATCTG CGCCCTGTCG GTCACCTCCC 
MQ1 GCGGCAGCTT CGAGCAGAAG CTCAACTGGG CCTTTGAGAT GTACGACCTG 
151 GACGGCGACG GGCGAATCAC GCGCCTGGAG ATGCTGGAGA TCATCGAGGC 
5D1 AATCTACAAG ATGGTGGGCA CCGTGATCAT GATGCGCATG AACCAGGACG 
551 GGCTCACGCC CCAGCAGCGT GTGGACAAGA TCTTCAAGAA GATGGACCAG 

55 , bOl GATAAGGACG ACCAGATTAC ATTGGAGGAG TTCAAGGAGG CAGCCAAGAG 
t51 TGACCCATCC ATTGTGTTGC TGCTGCAGTG TGACATGCAG AAGTAGAAGC 
701 TGGTGAGGGG CAGGGTCCCT GGCCAGAAGG GGCATGGCCA CCTCCCAACC 
751 TGATGACCTC TCTGGCTGGC CTCCCAGGAG GAGGGACACT CCAGCCCCCC 
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WO 01/98454 



PCT7IB01/02050 



SOI TCTCTGGCCC ACCCAGTCCT 
fl51 ATCTTTGAGG GACCACCTCA 
^01 CTGTCTTCTA GCCCCACCTC 
T51 ATAGGGGAGT TGGCTTTTGC 
5 1D01 CTGGGGTTCT GGTTAGGAAT 
1051 ATGTGGTCCC ACAGGCCTGT 
1101 GAGGCTCCAG ATCCCATAAA 
1151 CCTGGCCCTT CCAGCCCCAG 
1201 AGCTAATGAT TACTGAGCAC 

10 1251 AAGACACATC TTGTGCCCTC 
1301 CCAGCCGTCA GGGTCTCAGC 
1351 CTGGGAGGAG CTATTTCATC 
mOl AGGACGAAAT GAAAAGCATT 
1M51 TTTAAGAAAA ATGAAATTTA 

15 1501 AGGAGCTACA GTCATTTTAT 
1551 TACTTGGTTT ATTATAAAAT 
ItOl CAAGGGGAAA AAACCTGAGA 
lt51 AGATAACACT TTTTAAGACT 
1701 TGCTGCTTCA GCTCTTCCTT 

20 1751 TTTCTGCTAC ATACTTACTC 
IflOl AAGCAAGGTT TGCAAAGAGT 
1651 TTCGTTTCTG GATTGGGTTT 
nOl CCACAGAATC AGAAAGAGCT 
1151 GGCACCCAGA CATAATTTAT 

25 2001 GCTCTGTTTG AGAGGCTTTT 
2051 GTTCAACATC TTCCAAGTGT 
2101 AATTTTGGAT CCATGAGCTA 
2151 TTAATCTTGC TTCTTCATCA 
2201 TACCTTTGAC GTGCAGTGAC 

30 2251 ACAGTCTAGT ACACAGGTGC 
2301 TGCTGAGCCC GGGGCAGGGG 
2351 GCTCCTCACT CCTGAGTGCC 
2M01 TCTGGGGGAT GCTGATCAAT 
2M51 TGGAGGTAGC AAGGCCACTG 

35 2501 CACTGGTTTG CAACCACTGG 
2551 TGACCAGCCA TATGGTGAGG 
2t01 AGCAGTTGTT TATCCAGCAA 
2tSl GCTGGCTATT AGGTATGTCT 
2701 GATGCTCACC AGCCTGGCTT 

40 2751 TTTTCACTAA GTGAGGTTCC 
2A01 CGCAACCACA CAGAATTTCA 
2A51 CTCATTCTAT ACTCACATCC 
2<iDl CTAGACTTGG AAGCTGAGAT 
2151 ACATAGTCAG GAGGTGACAC 

45 3001 GGATTTCTTC TTTTCAGAGT 
3051 GTGCAGAGGT TGACAGCAGG 
3101 GCTTCCTGCT AAAAAGCTTC 
3151 GCTTGGTTGA AGTCCCCACA 
3201 AGCTGTGCTA ACAGCTCAGT 

50 3251 CTCGAAGCAT CCTGCATTGT 
3301 ACATGCCTAC CCATGAAGGC 
3351 CATACCCATG GGTGATTTTT 
3H01 CAGCAGTGTG AACACACAAT 
3M51 CTGATGGAAG GAACAACAGG 

55 3501 GGTGTCCCAG GGACTGTGTG 
3551 TGCCTTGAGA AGAGACACAG 
3b01 CTTGCCACAA AGCACAAGGC 
3L51 CACAAAAATA TACAGACAAT 



CTGCCCAAGC CCTTCCTCCC CTCCATCAAG 
CCCTGCAAAA GAGACAGGTC CTCCAGTACC 
CCACTTGGCC AGAACCAATG TCCATTGGGC 
CCCAGGAG6T GAGGTTAAGG AGTTGGGGGC 
TCTCTTGATC CTGGGATTAT GCTTTATAGG 
CACAGGGCCA AATTGGGTCT GTCCATTCCT 
GGGGGTCTCT TCCCCATCCC TTCTACTCTA 
CCTTTGGAGC GTTCATTCAG TCCTTTCTTC 
CTGTTTGGTG CTAAGGATAT GGTCATTTAC 
TGGAAGCTCA TAGGGTTGTG AGGCAAACTT 
TAAGCAGAAG GTGCTGGAAG GCTGGTTAGT 
TTCCAGCTCA GCTCCACACA AAGCTGCAGA 
TGGAAGTTTA GGAGCCACGT GAGTGAAAGT 
TGTCATACTT ATTTTTTTAG TACCCTTTAA 
TATTTCAGGA GGTTAAAATA TACTCTATAT 
GATTAAATGA ATAGAGAAAA TATTAATTTT 
AGAAAGGGAG AAAAGACCAT GAAATTTACC 
AAGTCCTGAG CTGCCACTCT CAGCAGTTTT 
TTTATTACCT TTTTCAATTC AACAAGCAAC 
CGGTTGGGTG CTGACTTCAG GGACAGGAAA 
GAAACTAGTG TATATTCCGT ATCTTGGTAG 
AGTTTCAGAA CTGGACTTGT TCCTTCACTG 
AGAAGAAAAG GCTCACCTGG CCACTGTTTA 
GGACGAAATG CCTAAAAATG TGCCAGGCAT 
TCTAACCCCA AATCTTAGAT CTGCCAGGTA 
GCTGGTTCTG CTTTCCAATG CCTGCTTCCC 
TACAGCTGCA TGCTTTGACT GCCGGAAAAA 
GGTCTTTCTC CTGTACTTGT GATCAGAAAT 
AGTTGATTTC CTCTTGAACT GCCGGTGAAA 
TGTCAGCCCA GGGTGGGAGC AGGAAATGAT 
AATTGCATCT GCAGGAAAGA GATGCAGCAT 
CACCTGTCCT GCTTCTCTGC AGGTGAAAAC 
AGAGCTTGGT CCCAAGCTCT ACTGGGCCCT 
GGTTGCTATC CTCTTGATGG GGATAGCAAC 
GTTGCTATCC TTTTGCTATC CTCTTGCTCA 
CTGGGGAGTT CACATCCTCA GGCAGGAACT 
TGCCTCAAGG ATGTTGCATT GCTCCCAGGA 
TGTGCGGTCA GTCAGCATCA CAGACACATA 
AGCTGGGACC TAAATCTTCT GGTGAAAAGC 
TTCCCTGCAA ATGCTGAATC TAGCCTAATT 
TGGCTTTCAA AGGCTTGCCA TGTGCCCCAT 
CATGGAGGTG AGGATTTTCA CTTCTTTTCT 
TCAGAGAGGA AGCATCCCTT GTGCAAGATC 
AGGGCTAAGA CTTGAACCAA GGCTCTAAGA 
CTCTTCCCTG TCCATTTCTG TGACTAAGCT 
GCAAGTTACA TTGATATTCA TCCTTTATAG 
TGAGATTGTG GTCTTCCAAA AAAAATAGGA 
TTTTCAAGCA CTCA6TGTTC TGCCTCTGGC 
GCTGTCCTGG GAGTCCTCTG ACTCAGAACC 
CTTTACCCAC CATCATCGTC ACTAAGAGAA 
GTGTTTGATT ACTCCAGGCT TCTGGACACA 
GCTCCTCAGG CCCAATATTC TCAGACAGCC 
GCCAGGCCAG GAACTGGGAC CACCATCTTG 
TGGCCCAGGA CATGCTCCTG CATACTCCTG 
CTCAGGAGCA CTGTGGTAGA GCACTGGCCC 
GTCTCCCGTC CCTGCACCAG CTGAGAGAGA 
TGGCAGAGAT TTATGTATGA CTTGCACAGA 
CAAAACATTG ATATATTCAA ACTCTCCTTT 
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3701 AAATTCCAAT CTTATTGCAA CAACTCTGTG AATTGCAAGG TCCCAGAATC 

37S1 TGCCTTCTCA CATACTCTAC CCTCATTCAT CCTTTTGGGC TAATTGATGA 

3801 GCATCTTATT TCTTATCTCT AAAAATTATC AGCAAAGGCT ACTTCAGATG 

3AS1 GCCACTTTAG TCCTTTCAGC TGTAGTCAGG ATTATTTAAC TTACCTGTAT 

5 3101 ATCAAAAGTG AAGAAAAAGT TAGTTCATAA GTAAAGGCAC TAAATCCTTT 

3151 CCTGACAATG GCAGAGTCTC TAGAGGTAGA AATTTGCCTT GCTGCAGAGA 

4D01 GAGAAGGAAT GGCGTGGGAT GGGGGAAAGA AAAGAAAGAG AAGAAGAGAA 

M051 GAAGCTGGGG TCTCCAGGCA GGGTAGTAAG CTGACACTAA ATATTTTTTA 

M101 CACAAAAATG TATTGAAGCA ACAAATATTT CCTGAAGATC CACCCTGGGT 

10 M151 GAGGCTTTGA GCTGACTTTA GAGATCACTG TGGGGTCAAG AATGTCTTAC 

M201 ATGTTTTATT CATCATTCTT GAAAAAAGAA ATAATTCAAA CCTTGGAATT 

1251 AAAAAGTCAG AAAAACAAAA AAAAAAAAAA AAAAA 



15 BLAST Results 

No BLAST result 

Medline entries 



20 



50 



55 



133b7i|7D: 

25 Kajimoto Y-i Shirai Mukai H-, Kuno Tt Tanaka C-=i Molecular 
cloning of 

two additional members of the neural 

visinin-like Ca(2+)-binding protein gene family- J Neurochem 1113 
Sep^bl(3):1011-b 

30 

1b071121: 

Polymeropoulos M.H.-. Ide S-t Soares M-B-t Lennon G-G-S Sequence 
characterization and genetic mapping of the human VSNL1 gene-i a 
homologue of the rat visinin-like peptide RNVP1. Genomics 
35 21(1) :273-27S(mS) . 



40 Peptide information for frame 1 



ORF from 121 bp to b13 bpi peptide length: in 
Category: strong similarity to known protein 
45 Classification: Protein management 
Prosite motifs: EF_HAND (73-flS) 
EF_HAND (101-121) 
EF_ HAND (151-171) 



1 MGKTNSKLAP EVLEDLVUNT EFSE<2ELK(2U YKGFLKDCPS GILNLEEFflfl 

51 LYIKFFPYGD ASKFAflHAFR TFDKNGDGTI BFREFICALS VTSRGSFEflK 

101 LNUAFEMYDL DGDGRITRLE MLEIIEAIYK MVGTVIMMRM NflDGLTPdflR 

151 VDKIFKKMDfi DKDDfllTLEE FKEAAKSDPS IVLLLflCDMfl K 



BLASTP hits 
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5 



10 



WO 01/98454 PCT/1B01/02050 

No BLASTP hits available 

Alert BLASTP hits for DKFZphamy2_2gl2-, frame 1 
No Alert BLASTP hits found 

Pedant information for DKFZphamy2_2gl2n frame 1 

Report for DKFZphamy2_2gl2 • 1 



[LENGTH! 231 

15 EMU! 21,277-12 

EpIJ S-2b 

CHOnOLU PIR:JHDfil5 neural visinin-like Ca2+-binding 

protein-type 2 - rat Ie-1D7 

[FUNCAT3 1fl classification not yet cleai — cut ES- cerevisiaei 

20 YDR373wJ 3e-S2 

[FUNCAT3 03-01 cell growth IS- cerevisiae-, YKLnOwl 3e-lfl 

[FUNCAT3 03-07 pheromone response! mating-type determination! 

sex-specific proteins ES- cerevisiae-i YKLlTOwJ 3e-lfl 

EFUNCAT1 13- 0*4 homeostasis of other ions IS. cerevisiae-i 

25 YKLllOuO 3e-lfl 

EFUNCAT1 OM-OS-Ol-OM transcriptional control ES- cerevisiaei 

YKLlTOuO 3e-lfl 

EFUNCAT3 30.03 organization of cytoplasm ES. cerevisiaei 

YKLnOwI 3e-lfl 

30 EF0NCAT3 11-01 stress response IS • cerevisiae-i YGRlOOwJ 7e-04 

EBL0CKS3 BL00303B S-100/ICaBP type calcium binding protein 

EBL0CKS3 BLODOlfl 

IBL0CKS3 PR00450G 

[BL0CKS3 PR00HS0F 

35 [BLOCKS] PR004S0E 

EBL0CKS3 PR00HS0D 

IBL0CKS3 PR0Qi*S0C 

[BLOCKS J PR004S0B 

EBL0CKS3 PR00MS0A 

40 ESC0P3 dlosa 1-37.1-S-13 Calmodulin [(Paramecium 

tetraurelia) fle-25 

ESC0P3 dlrec 1-37. 1-5. 51 Recoverin [bovine (Bos 

taurus) le-72 

ESC0P3 dla4pa_ 1-37-1-2.S Calcyclin (S100) [Human (Homo 

45 sapiens) i PI 7e-0S 

ESC0P3 dlrro 1.37.1.4-1 Oncomodulin [rat (Rattus 

norvegicus) 2e-17 

ESC0P3 dlsyma_ 1-37. 1-2. 2 Calcyclin (S100) Erat (Rattus 

norvegicus) le-lM 

50 [SCOPJ dHicb 1.3?. 1. 1.1 Calbindin DTK Ebovine (Bos 

taurus) 2e-lfi 

ESC0P3 dlauib_ 1-37. 1.5. M Calcineurin regulatory subunit 

(B-chain le-MS 

EPIRKII3 blocked amino end le-IT 

55 EPIRKU3 phosphotransferase 3e-0fl 

EPIRKU3 duplication 7e-17 

EPIRKliD tandem repeat 7e-0b 

[PIRKliD heterodimer 7e-17 
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10 



15 



20 



25 



CPIRKliD heart 7e-0b 

EPIRKUJ serine/threonine-specif ic protein kinase 7e-0b 

EPIRKtO acetylated amino end 7e-Dfe> 

EPIRKU3 ATP 7e-0b 

CPIRKU3 skeletal muscle 7e-0b 

CPIRKLU signal transduction Me-bl 

CPIRKUO protein kinase 3e-06 

EPIRKU3 calcium binding le-Tl 

CPIRKLID alternative splicing le-13 

EPIRKtO lipoprotein le-n 

CPIRKQO cardiac muscle 7e-0b 

CPIRKUJ muscle 7e-0b 

EPIRKW3 myristylation le-=n 

CPIRKyi EF hand le-IT 

EPIRKUJ retina le-Mb 

CSUPFAMll calcium-dependent protein kinase 3e-dfl 

ESUPFAHJ unassigned calmodulin-related proteins 2e-3 l 4 

CSUPFAHJ protein kinase homology 3e-0fl 

CSUPFAH3 calmodulin le-TT 

ESUPFAHJ calmodulin repeat homology le-TT 

CPR0SITE3 EF_HAND 3 

CPFAMl EF hand 

IKkU All_Alpha 

EtCblJ 3D 



SEfi GGSGADLGEHSCRPASfiPRFPRPAEARSHPATRRPASGPAnGKTNSKLAPEVLEDLVflNT 
Irec- 

HHHHHHHHHTTTT 

30 

SEC EFSE<2ELK<2WYKGFLKDCPSGILNLEEFfi(3LYIKFFPYGDASKFA<3HAFRTFDKNGDGTI 

Irec- CCCHHHHHHHHHHHHHHTTTTEEEHHHHHHHHHHHTTTTCHHHHHHHHHHHH 

--CEE 

35 SE<3 DFREFICALSVTSRGSFEflKLNUAFEMYDLDGDGRITRLEHLEIIEAIYKriVGTVirillRri 
Irec- 

EHHHHHHHHHHHHCCCGGGHHHHHHHHHTTTTCCCEEHHHHHHHHHHHHHCCTTTTGGGC 

SEC N(3DGLTPr3(3RVDKIFKICf1I>t3]>K]>]><3ITLEEFKEAAKS]>PSIVLLL«3C]>l1flK 
40 lrec- TTTTTCHHHHHHHHHHHHCCTTTTEECHHHHHHHHHHCHHHHHHHCCCHHH 



45 



50 



PSOD016 
PSDODlfl 
PSQDulfl 



Prosite for DKFZphamy2_2gl2 - 1 



113->12b 
mi->lb2 



EF_HAND 
EF_HAND 
EF_HAND 



PDOCDOOlfl 
PDOCDuDlfl 
PDOCDuDlfl 



Pfam for DKFZphamy2_2gl2.1 



55 HMH_NA11E EF hand 

HMn *EIqEnFrmnDk»GDGyIDFEEFmennkem* 

a +FR +DK+GDG+IDF EF+ +++ 
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fluery 1DM FAflHAFRTFDKNGDGTIDFREFICALSVT 132 

27-15 14D Ita 1 21 dkfzphamy2_2gl2.1 strong 

similarity to NVL-H (Rattus norvegicus) 
5 Alignment to HUM consensus: 

(Juery ♦ElqEHFrmMDkDGDGylDFEEFmellHkem* 

++++F+M+D DGDG+I+ E++E++ ++ 
dkfzphamyS 14D KLNUAFEMYDLDGDGRITRLEHLEIIEAI Ibfl 



10 fluery 218 1 ' 2=} dkf zphamy2_2gl2-l strong 

similarity to NVL-2 (Rattus norvegicus) 

Alignment to HUM consensus: 
HUM *EIqEMFrmri]>kDGDGyIDFEEFmeMHkem* 

++++F++MD+D+D +1+ EEF+E+ K+ 
15 fluery IRQ RVDKIFKKND<2I>KI>I>(2ITLEEFKEAAKSD 21fi 
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5 group: amygdala derived 

DKFZphamy2_2il7 encodes a novel Mb2 amino acid protein without 
similarity to known proteins- 

10 Host ESTs are derived from brain and pancreas- 
No informative BLAST resultsi No predictive prositei pfam or SCOP 
motif e - 

The new protein can find application in studying the expression 
15 profile of amygdala-specific genes- 



unknown protein 
20 perhaps complete cds- 

Sequenced by MediGenomix 
Locus: unknown 

25 

Insert length: 3M73 bp 

Poly A stretch at pos- 3M5Mi polyadenylation signal at pos- 3H3b 



30 1 GATATCCCAA TCTTTGGACT GCATCCTGGT TGCCTCTACT 6TGGTCACCT 

51 TTGGGAAGAA ATGTCTTCTG TAAAAAGAAG TCTGAAGCAA GAAATAGTTA 

1D1 CTCAGTTTCA CTGTTCAGCT GCTGAAGGAG ATATTGCCAA GTTAACAGGA 

151 ATACTCAGTC ATTCTCCATC TCTTCTCAAT GAAACTTCTG AAAATGGCTG 

201 GACTGCTTTA ATGTATGCGG CAAGGAATGG GCACCCAGAG ATAGTCCAAT 

35 251 TTCTGCTTGA GAAAGGGTGT GACAGATCAA TTGTCAATAA ATCAAGGCAG 

301 ACTGCACTGG ATATTGCTGT ATTTTGGGGT TATAAGCATA TAGCTAATTT 

351 ACTAGCTACT GCTAAAGGTG GGAAGAAGCC TTGGTTCCTA ACGAATGAAG 

>401 TGGAAGAATG TGAAAATTAT TTTAGCAAAA CACTACTGGA CCGGAAAAGT 

M51 GAAAAGAGGA ATAATTCTGA CTGGCTGCTA GCTAAAGAAA GCCATCCAGC 

40 501 CACAGTTTTT ATTCTTTTCT CAGATTTAAA TCCCTTGGTT ACTCTAGGTG 

551 GCAATAAAGA AAGTTTCCAA CAGCCAGAAG TTAGGCTTTG TCAGCTGAAC 

bDl TACACAGATA TAAAGGATTA TTTGGCCCAG CCTGAGAAGA TCACCTTGAT 

b51 TTTTCTTGGA GTAGAACTTG AAATAAAAGA CAAACTACTT AATTATGCTG 

701 GTGAAGTCCC GAGAGAGGAG GAAGATGGAT TGGTTGCCTG GTTTGCTCTA 

45 751 GGTATAGATC CTATTGCTGC TGAAGAATTC AAGCAAAGAC ATGAAAATTG 

601 TTACTTTCTT CATCCTCCTA TGCCAGCCCT TCTGCAATTG AAAGAAAAAG 

651 AAGCTGGGGT TGTAGCTCAA GCAAGATCTG TTCTTGCCTG GCACAGTCGA 

101 TACAAGTTTT GCCCAACCTG TGGAAATGCA ACTAAAATTG AAGAAGGTGG 

^51 CTATAAGAGA CTATGTTTAA AAGAAGACTG TCCTAGTCTC AATGGCGTCC 

50 1001 ATAATACCTC ATACCCAAGA GTTGATCCAG TAGTAATCAT GCAAGTTATT 

1051 CATCCAGATG GGACCAAATG CCTTTTAGGC AGGCAGAAAA GATTTCCCCC • 

1101 AGGCATGTTT ACTTGCCTTG CTGGATTTAT TGAGCCTGGA GAGACAATAG 

1151 AAGATGCTGT TAGGAGAGAA GTAGAAGAGG AAAGTGGAGT CAAAGTTGGC 

1201 CATGTTCAGT ATGTTGCTTG TCAACCATGG CCAATGCCTT CCTCCTTAAT 

55 1251 GATTGGTTGC TTAGCTCTAG CAGTGTCTAC AGAAATTAAA GTTGACAAGA 

1301 ATGAAATAGA GGATGCCCGC TGGTTCACTA GAGAACAGGT CCTGGATGTT 

1351 CTGACCAAAG GGAAGCAGCA GGCATTCTTT GTGCCACCAA GCCGAGCTAT 

1M01 TGCACATCAA TTAATCAAAC ACTGGATTAG AATAAATCCT AATCTCTAAA 
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1MS1 TCTAAGAACT AAGCTTTGAG TATTATTTAA TAATTTCTAA TAACACTCAT 

1501 TCCTCAAGTG ATATTAGAGA TTATTCAGTA CTCTTGAGAG TGTCACAACA 

1551 CAAAATACGA TGTTGGGTTT TCGAAATATT TTCAAAGTGT TCTGTCTTAA 

IbOl TCACAAATTC ATATTTTTAC ACATTTTTAC AATATTGCCT CAGATTATGT 

5 lb51 TAAATTTGGG TCAGTCTTCT CTGAACTTTT TCTCTCTCGG TTTCTTTTCT 

1701 TCCTTCACA6 TTTTATCTCA CAAAACCATT TTTCTAATAA GAGACATCAT 

1751 GTTGGAAAGA TGTTGTAGAA ATGTGCATAA ATTTCAGTGC CTCTTGTAAG 

1801 CATTAAACTG ATGATGAAGA AAGTTCCTGA TTTGAGAAAT GAATCAAAGT 

1A51 AATTTTAATG AATTTTTAGC TTGTATTAGC TTGAGTTAGC TGGCATTGAT 

10 1101 TTTTTAGTCC TTTTGTTACC TTTAAGTTGT CAATATATGG TTTTTGTTCA 

1151 TCTCCCCATT GTAGTCCCAC TTGCTCTTTC CTGGGGGTTC CATTGTTCTA 

2001 GCAGTGGAGG TGTTACAGTG TCGCCACTCG TCTAATTTGA CCAGTGTTAA 

2051 GAATTTTCTA ATTTAATAAT TTAATAGTGA TCTCAATACC ACACCCTCAT 

2101 GGAAGGAGAA AAGCATACTA TTATATCTGG GACCTCTCTT TTAGACCTAA 

15 2151 AATTAATTAA CATATCTACT TATATGTTAC TTATACCTAA AGCTGTTATT 

.2201 AAGACAAACC AAGATTCTCT GCTTTTGCAC TGAAATTAAA CTTGAAAGGA 

2251 ATTCTCCTCA AAGGTCGGAT ATTAAATAAG TCCCAGGCAG ATTTACATAT 

2301 TTAATTTAAA ACATTGGCTT TATTTCATTT TGTGATGAGT GATGTATCTG 

2351 TGTTAACAAA AAATTGTATA ATCATTACCA ATACTATTTA TTATGCTCAA 

20 2401 ATATATCTTG GCTTTGACCT TATTTCAACA CATTCTAAGA AGCCTTGACA 

2451 AAGTAAGTAT ATTTTAGAGC TGAATCAGTA AGATTCTAGA GAAAGCAAAA 

2501 CATAGTAGTT CACAATTTTG CAACATAGAA AGTCACATTT TGAAAGGCTA 

2551 TTTTGAAATT GATTTAATAG CTATTATAGT TTATGAATAT CAAAATTTGT 

2b01 ATAATTTGCA TCTTTACTAA TGTATGCTAG AGCTACAAGA GACCTTAAGG 

25 2bSl ATAATATATG AAATTAGCTT TCCTTATTTT ATAGATAAGG AAAAAGAAAT 

2701 TGTGAAAGGT GAATTTACCT AATTAGTGAA AGTTACATAA CTAATTACAA 

2751 CAGTCTGTAC TATATAATGC AGAGGACGAT TCTCCCTGTA AAAGGAACTA 

2801 GAAGCTATTA CTAAAAATAT ATATAGACAA AATTAAAAGA AGGAATGATA 

2fl51 AGAATAAATT TAATTTACCA AATATTGTTA ATTAAAATTT TAGATACTTA 

30 2101 ACATTTATTT AACTTAAATA AAAGATAACT GTCAGATAAA ACTTTATTTT 

2151 ACTAATGAGC AGTGATTTTC TTAGGAATTG ATGAAGGCTT ATTGGTATCA 

3001 AGAATTTAAA CCAAATTAAA ACTGACAGAG GACATTTAGA TACATAATAA 

3051 AATTCGAGCT ACATAAGTAT ATGGAAAATA ATGTACCTTG ATTATTATGA 

3101 AATAGAGCAT CTTGAAATTC AGTTTTACTC TAAATGTACT TTTAATACTT 

35 3151 GCAGATTCTA AGATTACATT GTGAAATTCC AGGTTTTCAT AATGTTAAAA 

3201 TAGGAAAGTA GAATATAAAG TATCAACAAG TGTAGTTATA CATTTTGTTT 

3251 TGGATATTTA ATCCTTACTT GGGAAAAAAT CAGCATCTAG GTAAATTATT 

3301 ATTTTAATAA GAACTCTTAA ATTGCCAACC TCTGAGAGGT GAAAAGCTAT 

3351 GTAAATAGAA GGAATGGCCA GTTCAAAAGA ATAGTAGAAG TGATAGTGCC 

40 3401 GTGAATGTAT TCTACTGGAA ATGAATGTAA TAATACATTA AATTTTTAAA 

3451 ATCGAAAAAA AAAAAAAAAA AAA 



BLAST Results 

45 

No BLAST result 



50 Medline entries 



No Medline entry 

55 

Peptide information for frame 1 
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ORF from bl bp to imb bpi peptide length: M t=S 
Category: putative protein 
Classification: unclassified 
5 Prosite motifs: MUTT C355-3?iO 



1 MSSVKRSLKfl EIVTflFHCSA AEGDIAKLTG ILSHSPSLLN ETSENGUTAL 

SI MYAARNGHPE IVCFLLEKCC DRSIVNKSRfl TALDIAVFUG YKHIANLLAT 

10 1D1 AKGGKKPWFL TNEVEECENY FSKTLLDRKS EKRNNSDULL AKESHPATVF 

151 ILFSDLNPLV TLGGNKESFfl <2PEVRLCl2LN YTDIKDYLAtJ PEKITLIFLG 

EDI VELEIKDKLL NYAGEVPREE EDGLVAUFAL GIDPIAAEEF KflRHENCYFL 

SSI HPPMPALLflL KEKEAGVVAfl ARSVLAWHSR YKFCPTCGNA TKIEEGGYKR 

301 LCLKEDCPSL NGVHNTSYPR VDPVVIMflVI HPDGTKCLLG RfiKRFPPGMF 

15 3S1 TCLAGFIEPG ETIEDAVRRE VEEESGVKVG HV(3YVAC<2Pti) PMPSSLMIGC 

M01 LALAVSTEIK VDKNEIEDAR WFTREflVLDV LTKGK<2<3AFF VPPSRAIAHlJ 
HS1 LIKHUIRINP NL 



20 

BLASTP hits 

No BLASTP hits available 

25 Alert BLASTP hits for DKFZphamy2_2il7-» frame 1 

No Alert BLASTP hits found 

Pedant information for DKFZphamy2_2il7-i frame 1 
30 - -- 



Report for DKFZphamy2_2il?.l 



35 [[LENGTH! ilb2 

IE MO 5207b. 25 

CpID b-3fl 

EHOMOLJ TREriBL:SPBC177fl_3 gene: "SPBC177fl.Q3c n i product: 

"conserved hypothetical protein"* S.pombe chromosome II cosmid 
40 cl77fl. le-MS 

EFUNCAT3 =11 unclassified proteins ES. cerevisiaei YGLDb7wJ 

Me-3H 

EFUNCAT3 r general function prediction EH- influenzae! 

HI0132 pyrophosphohydrolase3 Me-E 1 * 
45 EFUNCAT3 1 genome replication-i transcription-i recombination and 
repair EM. jannaschii-i MJim^ nucleotide pyrophosphohydrolasel 
le-DM 

EBL0CKS1 BLD02nF Anion exchangers family proteins 
EBL0CKS3 BL01213B 
50 EBL0CKS3 

EBL0CKS3 PF00023A 

EBL0CKS3 BLOOe^ mutT domain proteins 

ESC0P3 dlawcb_ 1. =11. 3. 1-2 GA binding protein (GABP) alpha 

GA bindini Be-35 
55 ESUPFAMJ hypothetical protein HIDM32 le-22 
EPR0SITE3 MUTT 1 

EPFAM3 Bacterial mutT protein 

EPFAM3 Ank repeat 
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EKkD Irregular 
1Kb) 31 3D 



5 SEfl nSSVKRSLKflEIVTflFHCSAAEGDIAKLTGILSHSPSLLNETSENGUTALriYAARNGHPE 
lawcB • CCCTTTTCTTTCCHHHHHHHHTTHHHHHHHHHCCCTT- 

TTEETTTEEHHHHHHHHCCHH 

SEfl IVflFLLEKGCDRSIVNKSRflTALDIAVFWGYKHIANLLATAKGGKKPWFLTNEVEECENY 
10 lawcB 

HHHHHHHHCCTTTTCBTTTBCHHHHHHHHCCHHHHHHH 

SEfl FSKTLLDRKSEKRNNSDldLLAKESHPATVFILFSDLNPLVTLGGNKESFflflPEVRLCflLN 
lawcB 

15 

SEfl YTDIKDYLAflPEKITLIFLGVELEIKDKLLNYAGEVPREEEDGLVAUFALGIDPIAAEEF 
lawcB 

20 

SEfl KflRHENCYFLHPPnPALLflLKEKEAGVVAflARSVLAblHSRYKFCPTCGNATKIEEGGYKR 
lawcB 



25 SEfl LCLKEDCPSLNGVHNTSYPRVDPVViriflVIHPDGTKCLLGRflKRFPPGnFTCLAGFIEPG 
lawcB 



SEfl ETIEDAVRREVEEESGVKVGHVflYVACflPUPnPSSLMIGCLALAVSTEIKVDKNEIEDAR 
30 lawcB 



35 



45 



SEfl UFTREflVLDVLTKGKflflAFFVPPSRAIAHflLIKHIilIRINPNL 
lawcB 



Prosite for DKFZphamy2_2il7 - 1 
40 PSQDin3 355->375 I1UTT PDOCOOb^S 



Pfam for »KFZphamyE_5il7.1 



HMH_NAME Ank repeat 



HPIH *GyTPLHIAARyNNvEHVrlLLflHGADIN* 
50 G+T+L++AAR+++ E+V++LL++G D 

fluery Mb GUTALHYAARNGHPEIVflFLLEKGCDRS 73 



55 Hf1M_NAf1E Bacterial mutT protein 
Hflll 

*ILHiqRedppnHYdtHhgdUIFPGGkIEeGETPEflCarREIUEETGI* 
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L++++++ +++ 
++G+IE+GET+E+++RRE++EE+G+ 
fluery 33? CLLGRflKRF—PPG-- 

I1FTCLAGFIEPGETIEDAVRREVEEESGV 377 
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+ 
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5 group: intracellular transport and trafficing 

DKFZphamy2_2ol3 encodes a novel 510 amino acid protein with high 
similarity to murine synaptotagmin 3- 

10 The novel protein contains two C2 domains- The C2 domain is 

thought to be involved in calcium-dependent phospholipid binding. 
Synaptotagmins are essential for Ca (2+) -regulated exocytosis of 
neurosecretory vesicles- 

15 The new protein can find application in modulating/blocking 
synaptic activity. 

similarity to synaptotagmin 3 (Mus musculus) 

20 

Sequenced by MediGenomix 

Locus: unknown 

25 Insert length: 2131 bp 

Poly A stretch at pos- 2112 1 polyadeny lation signal at pos- 2fiflM 



1 ACTCTATGTC 
30 51 CAGGACCTTG 

101 CGTAGTTGGT 
151 TGGCACCTTG 
201 CTAATCTTTG 
251 CTTAGCCCCT 
35 301 CCTGTTCTCT 
351 TACCCAGTCC 
401 TCTACTCCTA 
451 CCTTCTTCCT 
501 CCCTTTGCCC 
40 551 ACACCTGCCC 
L.D1 CCATCCCTCC 
feSl AGGATGACCT 
701 CGGGTCCGAG 
751 AATCCGAGGC 
45 601 TGTCGGTCAT 
651 TTCGTGTCCT 
101 GGCAGTGGGC 
151 TGGCAGGCCT 
1001 GGCCATCCTC 
50 1051 TCCACCCTTT 
11D1 CCCCTGAGCC 
1151 GCAGCAGTGG 
1201 CTCTGAGGGG 
1251 GGGGCTTGCC 
55 13D1 ACTACCAGGT 
1351 CTCCCAGCCG 
1401 CCCTGCCTGG 
1HS1 GAGCTGTACC 



TCCTCTCGTT 
TTCTCTGCTG 
GCTGGATGGA 
TCTGCTCATC 
TTTCCTCATC 
TCACTGCCCT 
GTCCTCTACT 
CTGCTACTAT 
ATTCCTTTTC 
CGAATCTCCA 
ATCCTTCCTT 
ACCCTGCACT 
CACACACAGG 
CTGCCGGCGG 
ATGCTGACAC 
TATCCCCGGG 
CGTGACATTC 
GGAAGTTGTG 
GGTGGCCCCC 
GGTAGGCGGA 
TGCTGGGCGG 
GCTGAGCTGC 
CTCCTACTTG 
CCGCTGGGGT 
GGAGCAGGCT 
CAGTGCCCAG 
ACCCAGCCCT 
GACCCCAGCA 
AGGCGAGGAA 
AGGGGACTGG 



GGATTGTGAC 
GATTCGCAGC 
TGTTTGTTGA 
CCTAACTCCT 
TGTCCATCCC 
TCTCCATCTC 
CCACTCATGC 
CTCCATCCCT 
CTTGTCCATC 
TCCCTAATCC 
TTCTCGGTGT 
CCCATTCTGT 
ACCAGACGGC 
GCACTCATCC 
CAACGACAGG 
GTCCAGATGC 
TGTGGCATTG 
CTGGGTGCCC 
TGCGCAAAGA 
GGCGGGCACC 
CCCACACCAC 
TGGAGCCAGG 
GACATGGACT 
CAAACCGAGC 
CTGGGTTGCT 
TCACATCAGC 
GCCCCGACCC 
GTGAGGAGCG 
AAAGCCAAAC 
CCCTGGTGGC 



ACCGGGAGGT 
AACCAGCACA 
ATGAATGAAT 
GTTCCTTCAT 
TTTATTTGTG 
TTCCTCCTTG 
CCATCTCTGT 
AATTTCTGCC 
CCTAATACCT 
ATCTGCCCCT 
CTCTTTCCAC 
TTCCCATCTG 
CACCATGTCA 
TGGTCTCGGA 
TGCCAGGAGT 
AGACATCTCC 
TCCTTCTGGG 
TGGCGGGACA 
CCTAGGCCCT 
ACCTGGCGGC 
CATGCCCATG 
CAGCCTGGGG 
CGTATCCAGA 
CAAACATCCC 
CCTGCTGCCC 
AGGTCACAAG 
CTCACCCAGC 
CCCACCTGCC 
TCATTGGGCA 
CGGCGGAGCG 



CAGGGAACTC 
GCACGTAGGG 
GATGAATGGC 
CTGTGCAGCC 
CATCCTCATT 
TTCATTTGTC 
CCCCTTGACT 
CTCTTGTCTG 
GTCACCTTGT 
AATCTCTGTC 
CCTTATCTCC 
CACCCTTGCC 
GGAGACTACG 
CCTCTGTGCG 
TCAATGACCG 
GTGAGCCTGC 
TGTCTCTCTC 
AGGGAGGCTC 
GGTGTCGGGC 
TGGCCTGGGT 
CCGCCCACCA 
GGTTCTGACA 
GGCTGCAGCA 
CTGAGCTGCC 
CCCAGTGGTG 
CCTGGCACCC 
AGACTCTGAC 
CTGCCCTTAC 
GATTAAGCCA 
GTGGGGGCCC 
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10 



15 



20 



25 



30 



1501 AGGCTCTGGA 
1551 CCCTGCGGTA 
IbOl GCCCTGGACC 
1L51 CAAGATCTAC 
1701 ACAGGAAGAC 
1751 CCCCTGGCCG 
IflOl TGACCGCTTC 
1651 TCCTGGAGCT 
1101 GTGGAGGGCG 
nSl CTGCTACCTC 
2001 CTAACCTCAA 
2051 GCCTCCCTGA 
2101 CAAGAAGAAC 
2151 TGGCCCCCGA 
2201 TACGACTGCA 
2251 CGACGCTGCC 
2301 ATCCCCGCAA 
2351 GTGACCAGCT 
2M01 CGAGTGAGGG 
2M51 CCCCATCCTT 
2501 GTCCGAGCTG 
2551 GAAGCGAGAG 
2b01 GGTCATTCA6 
2b51 CGAGGCGAGG 
2701 CTGTCTGTCT 
2751 CTTGTTCTCG 
2A01 CACACACACA 
2A51 CCGGTCCCCC 
2101 GGGTCTCACT 



GAGGCAGGCA 
CCTCTATGGC 
TCCCTGCCAA 
CTGCTGCCTG 
CCTGAACCCC 
AGCTGGCCCA 
TCGCGGCACG 
GGCCGAGCAG 
GCTCGGAAAA 
CCCACGGCCG 
AGCGATGGAC 
TCAGCGAGGG 
ACGCTGAACC 
GAGCGTGGAG 
TCGGGCACAA 
GACCCGCACG 
GCCCGTGGAG 
TCACAAAAGG 
GTCTGGCCTA 
TCCTGCCCGG 
CTGGTGCGG6 
GATGAGAGGA 
CCTCCACTGT 
GGCCATGCAT 
CTGTCTCTTT 
CCGTGAATGT 
CCTGTGTCCA 
CACTGCTGCT 
CCAAAAAAAA 



CAGGGGCACC 
TCGGACCAGC 
GGACTCCAAC 
ACCGCAAGAA 
GTCTTCAATG 
ACGCAAACTG 
ACCTCATCGG 
CCCCCTGACC 
AGCAGATCTT 
GGCGCCTCAC 
CTCACTGGCT 
GCGGCGTCTG 
CCACCTATAA 
AACGTGGGGC 
CGAGGTGATC 
GCCGCGAGCA 
CACTGGCATC 
CAGCAAAGGA 
GGCCCGGGAT 
ACCGTGAATT 
GCAGCCCTGG 
GGCCGGCCCA 
GTCTGTCTTT 
GTCTGGGGGA 
GCTGTTTGTC 
CAATGGGCCA 
CCCCTTCTGT 
GCTATCAACG 
AAAAAAAAAA 



CTGTGGCCGT 
TGGTGGTGA6 
GGCTTCTCAG 
AAAGTTTCAG 
AGACGTTTCA 
CACTTCAGCG 
CCAGGTGGTG 
GCCCGCTCTG 
GGGGAGCTCA 
CGTGACCATC 
TCTCAGACCC 
AAGAAGCGGA 
TGAGGCGCTG 
TCAGCATCGC 
GGCGTGTGCC 
CTGGGCAGAG 
AGCTAGTGGA 
CTATCAGAGA 
CGGACCAGGC 
CATCTCCTTG 
CCCTAGGCTT 
GCTCCTTCTT 
TCTTCCCTGG 
CCCCTGCCCC 
CAAGACTCAG 
ATCCTCTCTG 
TCGCCACACC 
CCAGAATAAA 
A 



ATCAGCTTCG 
GATCCTGCAG 
ACCCCTACGT 
ACCAAGGTGC 
ATTCTCGGTG 
TCTATGACTT 
CTGGACAACC 
GAGGGACATC 
ACTTCTCACT 
ATCAAAGCCT 
CTACGTGAAG 
AAACCTCCAT 
GTGTTCGACG 
CGTGGTGGAC 
GTGTGGGCCC 
ATGCTGGCCA 
GGAAAAGACT 
AAGAGAACTC 
TCCCTCAGGA 
AAGCCATAAC 
CCTAACCCTG 
TCAGGGTGGG 
GGCTCCCCCT 
CCAAAACCCT 
TGTCCCGACC 
TCCTTTCAGA 
CTGCGTCTGG 
CACACTCTGT 



BLAST Results 



35 Entry MMABfi^.l from database TREMBL : 

product: "synaptotagmin 3 n: > flus musculus mRNA for synaptotagmin 
3-. 

complete cds- 

Score = Ifimi P = S.7e-23 e J, identities = 3b2/M50i positives = 
40 3b1/M50i 
frame +2 



45 



Medline entries 



1b0bM733: 

Fukuda Hi Kojima Ti Aruga Ji Niinobe ili Mikoshiba K-i Functional 
50 diversity of C2 domains of synaptotagmin family- 

Mutational analysis of inositol high polyphosphate binding 
domain- J 

Biol Chem 1115 Nov 3\27U(W) :2bS23-7 

55 



Peptide information for frame 2 
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ORF from fc.35 bp to 2404 bpi peptide length: STD 
Category: strong similarity to known protein 
Classification: Cell signaling/communication 
Prosite motifs: C2_D0HAIN_1 (323-33A) 
CE_D0riAIN_l (455-470) 



1 MSGDYEDDLC RRALILVSDL CARVRDADTN 

51 ISVSLLSVIV TFCGIVLLGV SLFVSWKLCIiJ 

101 GPGVGLAGLV GGGGHHLAAG LGGHPLLGGP 

151 LGGSDTPEPS YLDMDSYPEA AAAAVAAGVK 

SD1 LPPSGGGLPS AflSHflflVTSL APTTRYPALP 

ESI PALPLPLPGG EEKAKLIGfll KPELYflGTGP 

3D1 GRISFALRYL YGSDflLVVRI LlJALDLPAKD 

351 FCJTKVHRKTL NPVFNETF<3F SVPLAELAflR 

401 VVLDNLLELA EfiPPDRPLUR DIVEGGSEKA 

451 TIIKASNLKA I1DLTGFSDPY VKASLISEGR 

501 ALVFDVAPES VENVGLSIAV VDYDCIGHNE 

551 AEflLANPRKP VEHUHtfLVEE KTVTSFTKGS 



BLASTP hits 

No BLASTP hits available 

Alert BLASTP hits for DKFZphamy2_2ol3, frame 2 

TREMBL:|iriABa , 13_l product: "synaptotagmin 3"i flus musculus mRNA 
for 

synaptotagmin 3i complete cds-i N = 2i Score = lfll4i P = l.le-23^ 



>TREI1BL:riI1ABfl c 13_l product: "synaptotagmin 3"i Mus musculus mRNA 
for 

synaptotagmin 3i complete cds- 
Length = 567 

HSPs: 

Score = lfil4 (272.2 bits)-. Expect = l-le-23 c li Sum P(2) = Lle- 
231 

Identities = 3b2/MM e J (flOX)-, Positives = 31^/441 ( S2/C ) 

fluery: 142 FAELLEPGSLGGSDTPEPSYLDIIDSYPEXXXXXX- 
XXGVKPSdTXXXXXXXXXXXXXXXX 2D0 

FAELLEPG LGGS+ PEPSYLDPIDSYPE GVKPSC3T 

Sbjct: 143 

FAELLEPGGLGGSELPEPSYLDMDSYPEAAVASVVAAGVKPSCJTSPELPSEGGTGSGLLL 2G2 
fiuery: 201 

XXXXXXXXXXX<2SH(2flVTSLAPTTRYPALPRPLTi2t2TLTS<2P])XXXXXXXXXXXXXXXXX 2b0 

flSHtJCVTSLAPTTRYPALPRPLTflflTLT+fl D 

Sbjct: 2D3 

LPPSGGGLPSAlJSHflflVTSLAPTTRYPALPRPLTiaflTLTTflADPSTEERPPALPLPLPGG 2fe>2 



DRCcJEFNDRI RGYPRGPDAD 
VPURDKGGSA VGGGPLRKDL 
HHHAHAAHHP PFAELLEPGS 
PSflTSPELPS EGGAGSGLLL 
RPLTfllJTLTS fiPDPSSEERP 
GGRRSGGGPG SGEAGTGAPC 
SNGFSDPYVK IYLLPDRKKK 
KLHFSVYDFD RFSRHDLIGfl 
DLGELNFSLC YLPTAGRLTV 
RLKKRKTSIK KNTLNPTYNE 
VIGVCRVGPD AADPHGREHIi) 
KGLSEKENSE 
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fluery: 2bl 

XXKAKLIGfllKPELYflXXXXXXXXXXXXXXXXXXXXXXPCGRISFALRYLYGSDflLVVRI 320 

KAKLIG<2IKPELY(2 
PCGRISFALRYLYGSI><aLVVRI 

5 Sbjct: 2b3 EEKAKLIG(2IKPELY(2GTGPGGRRGGGSGEAGA 

PCGRISFALRYLYGSDflLVVRI 31? 

<3uery: 3B1 

Li3ALDLPAKI>SNGFSDPYVKIYLLP]>RKKKFl3TKVHRKTLNPVFNETFi3FSVPLAELAi3R 360 

10 

LflALDLPAKDSNGFS5PYVKIYLLPDRKKKFfiTKVHRKTLNP+FNETFt2FSVPLAELA<2R 
Sbjct: 318 

L<2ALDLPAKDSNGFSDPYV<IYLLPDRKKKFflTKVHRKTLNPIFNETF(JFSVPLAELA(3R 377 
15 fluery: 381 

KLHFSVYDFDRFSRHDLIGflVVLDNLLELAEflPPDRPLWRDIVEGGSEKADLGELNFSLC MMO 

KLHFSVYDFDRFSRHDLIGflVVLDNLLELAEflPPDRPLURDI+EGGSEKADLGELNFSLC 
Sbjct: 378 

20 KLHFSVYDFDRFSRHDLIGJJVVLDNLLELAEUPPDRPLURDILEGGSEKADLGELNFSLC H37 
fluery: MM1 

YLPTAGRLTVTIIKASNLKAMDLTGFSDPYVKASLISEGRRLKKRKTSIKKNTLNPTYNE SOD 

25 YLPTAGRLTVTIIKASNLKAMDLTGFSDPYVKASLISEGRRLKKRKTSIKKNTLNPTYNE 
Sbjct: M38 

YLPTAGRLTVTIIKASNLKAH1>LTGFSI>PYVKASLISEGRRLKKR<TSIKKNTLNPTYNE Ml? 
fluery: 501 

30 ALVFDVAPESVENVGLSIAVVDYDCIGHNEVIGVCRVGPDAADPHGREHUAEMLANPRKP SbD 

ALVFDVAPESVENVGLSIAVVDYDCIGHNEVIGVCRVGP+AADPHGREHUAEflLANPRKP 
Sb jet > ^fl 

ALVFDVAPESVENVGLSIAVVDYDCIGHNEVIGVCRVGPEAADPHGREHUAEMLANPRKP 557 

35 

fluery: Sbl VEMJHflLVEEKTVTSFTKGSKGLSEKENSE 51D 

VEHUHULVEEKT++SFTKG KGLSEKENSE 
Sbjct: 558 VEHUHULVEEKTLSSFTKGGKGLSEKENSE 587 

40 Score = 520 (78-0 bits)-, Expect = 1-16-23=5-. Sum P(2) = l.le-231 
Identities = W1DD (18*) -. Positives = Tl/lOO CH*) 

fluery: 1 MSGDYEDDLCRRALILVSDLCARVRDADTNDRCflEFND- 

RIRGYPRGPBADISVSLLSVI 5=1 
45 MSGDYED])LCRRALILVSDLCARVRDADTNDRCflEFN + 

RIRGYPRGPDADISVSLLSVI 
Sbjct: 1 

MSGDYEDDLCRRALILVSDLCARVRDADTNDRCflEFNELRIRGYPRGPDADISVSLLSVI bD 

50 (Juery: bD VTFCGIVLLGVSLFVSIilKLCUVPURDKGGSAVGGGPLRKD =H 

VTFCGIVLLGVSLFVSWKLCUVPWRDKGGSAVGGGPLRKD 
Sbjct: bl VTFCGIVLLGVSLFVSlilKLCIilVPIilRDKGGSAVGGGPLRKD 1DD 
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ELENGTHJ 51D 

EHIO b33tm.02 

EpID b-lb 

EH0H0L3 TREMBL : n(1AB6T3_l product: 



"synaptotagmin 3"i Mus 



musculus mRNA for synaptotagmin 3-, complete cds. D-D 



EFUNCATID 
be-10 
EFUNCATID 
• '-'CS • 
EFUNCATID 
?e-0b 
[BLOCKS! 
proteins 
EBLOCKSJ 
EBLOCKS J 
ESCOPJ 
(beta) 
ESCOPJ 



11 unclassified proteins 



ES- cerevisiaei YI1L07Sc3 



Dl-Db-01 lipid-i fatty-acid and sterol biosynthesis 
cerevisiaei YGR170w]l 7e-Db 
30- Dfl organization of golgi ES- cerevisiaei YGR170w J 

BLD1224A N-acetyl-gamma-glutamyl-phosphate reductase 

BLD1013B Oxysterol-binding protein family proteins 
PFD13bflB 

dlaHSa_ 2-t-l-2-2 C2 domain from protein kinase c 
ERa 2e-27 

dlrsy 2-b-l-2-l Synaptogamin Ii first C2 domain 



ERat (Rattu 4e-43 



ESC0P3 



dlrlw 2-b-l-l-2 A domain from cytosolic 



phospholipase A2 EHuma Se-12 



ESC0P3 

phospholipase 
EPIRKWJ 
EPIRKIO 
EPIRKLO 
EPIRKliU 
EPIRKbO 
EPIRKLD 
EPIRKbO 
EPIRKLU 
EPIRKliU 
EPIRKIO 
CPIRKU3 
EPIRKbO 
EPIRKIO 
EPIRKIO 
EPIRKIO 
EPIRKIO 
EPIRKIO 
EPIRKIO 
ESUPFAI1J 
ESUPFAMJ 
ESUPFAMJ 
ESUPFAIU 
ESUPFAMJ 
ESUPFAMJ 
CSUPFAM3I 
ESUPFAMJ 



dlqasb2 2-b-l-l-l Phosphoinositide-specif ic 
C 4e-27 
phosphotransferase 7e-15 
duplication be-7b 
synaptic vesicle le-lt7 
phorbol ester binding 2e-lM 
zinc 2e-11 

transmembrane protein □-□ 

serine/threonine-specif ic protein kinase 7e-lS 
membrane trafficking □-□ 
phospholipid binding be-7b 
autophosphorylation 7e-lS 
ATP 7e-lS 

phosphoprotein 7e-lS 
glycoprotein le-lb7 
calcium binding 5e-34 
alternative splicing le-10 
dimer le-7S 

membrane protein le-lb7 
calmodulin binding 2e-7 l 4 
ras-specific GAP catalytic domain homology le-Dfl 
protein kinase C zinc-binding repeat homology ?e-15 
protein kinase homology 7erlS 
protein kinase C alpha 7e-15 
HsC2 phosphatidylinositol 3-kinase le-01 
synaptotagmin □-□ 
PX domain homology le-DT 
pleckstrin repeat homology le-Dfl 
protein kinase C C2 region homology D-D 



ESUPFAMJ 
EPR0SITEJ C2_I>0h*AIN_l 2 
EPFAMJ C2 domain 

EKIO Irregular 
(C Khl ID 3D 

EKIO L0U_C0MPLEXITY 



20- DO •/. 
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SEfl MSGDYEDDLCRRALILVSDLCARVRDADTNDRCflEFNDRIRGYPRGPDADISVSLLSVIV 

SEG 

Irsy- 

5 

SEfl TFCGIVLLGVSLFVSUKLCWVPURDKGGSAVGGGPLROLGPGVGLAGLVGGGGHHLAAG 

SEG xxxxxxxxxxxxxxxxxxxxx 

Irsy- 

10 

SEfl LGGHPLLGGPHHHAHAAHHPPFAELLEPGSLGGSDTPEPSYLDNDSYPEAAAAAVAAGVK 

SEG xxxxxxxxxxxxxxxxxxxxx xxxxxxxx- • • 

Irsy- 

15 

SEfl PSflTSPELPSEGGAGSGLLLLPPSGGGLPSAflSHflflVTSLAPTTRYPALPRPLTflflTLTS 

SEG • • • -xxxxxxxxxxxxxxxxxxxxxxxxxxx 

lrsy- 

20 

SEfl APDPSSEERPPALPLPLPGGEEKAKLIGfllKPELYflGTGPGGRRSGGGPGSGEAGTGAPC 

SEG • • • xxxxxxxxxxxxxxxxxxx xxxxxxxxxxxxxxxxxxxxxx . • 

lrsy- 

25 

SEfl GRISFALRYLYGSDflLVVRILflALDLPAKDSNGFSDPYVKIYLLPDRKKKFflTKVHRKTL 

SEG 

lrsy- 

30 CEEEEEEEEETTTTEEEEEEEEEECCCCCBTTTBBCEEEEEEEETTTTTTEECCCTTTBT 

SEfl NPVFNETFflFSVPLAELAflRKLHFSVYDFDRFSRHDLIGflVVLDNLLELAEflPPDRPLUR 

SEG , 

lrsy- 

35 TTEEEEEEEECCCHHHHHCCEEEEEEEECTTTTCCEEEEE 

SEfl DIVEGGSEKADLGELNFSLCYLPTAGRLTVTIIKASNLKAMDLTGFSDPYVKASLISEGR 

SEG 

lrsy- 

40 

SEfl RLKKRKTSIKKNTLNPTYNEALVFBVAPESVENVGLSIAVVDYDCIGHNEVIGVCRVGPD 

SEG 

lrsy- 

45 - 

SEfl AADPHGREHUAEMLANPRKPVEHUHflLVEEKTVTSFTKGSKGLSEKENSE 

SEG 

Irsy- 

50 



Prosite for DKFZphamy2_2ol3.2 

55 PSDD4n 323->33T C2_D0nAIN_l PD0C0D3fl0 

PSODM^ MSS->M71 C2_D0MAIN_1 PI>0CDD3flD 
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Pfam for DKFZphamy2_2ol3 • 2 



5 HPIM NAME C2 domain 

Hnn 

*LtVrIIeARNLIilkf1Drin6f SDPYVKVdMdPdpkDtklCUKTkTilJNNCLN 

L+VRI++A +L+++D+NGFSDPYVK++++PD+K 

10 KK++TK+++++ LN 

fluery 31b LVVRILflALDLPAKDSNGFSDPYVKIYLLPDRK— 

KKFflTKVHRKT-LN 3bl 

HUM PVUNEEeFvFedlPyPdlqrkHLRFaVUDUDRFSRBDFIGHCi* 
15 PV+N E+F+F +P+ +L+ + L+F+V+D+DRFSR+D+IG+++ 

fluery 3bB PVFN-ETFflFS-VPLAELAflRKLHFSVYDFDRFSRHDLIGflVV 

102 

HUM 

20 *LtVrIIeARNLUkI1I>MnGf SDPYVKVdHdPdpkDtk<li)KTkTilJNNGLN 

LTV+II+A NL++MI> +GFSDPYVK +++ + 

+++KK+KT++++N+ LN 
fluery IMS 

LTVTIIKASNLKAflDLTGFSDPYVKASLISEGRRLKKRKTSIKKNT-LN M^S 

25 

Hnn PVhlNEEeFvFedlPyPdlqrkMLRFaVIJDUDRFSRBDFIGHCi* 

P++N E +VF+ ++ ++ +++ L +AV D+D+++++++IG+C+ 
fluery PTYN-EALVFD-VAPESVENVGLSIAVVDYDCIGHNEVIGVCR 
53b 

30 
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group : differentiation/development 

5 

DKFZphamyE_7 j5 encodes a novel h13 amino acid protein with 
similarity to Tspyll testis-specif ic Y-encoded-like protein of 
rius musculus- 

10 TSPY genes are arranged in clusters on the Y chromosome of many 
mammalian species- TSPY is believed to function in early 
spermatogenesis and is a candidate for GBYt the putative 
gonadoblastoma-inducing gene on the Y- The TSPY family forms part 
of a superf amily-i TTSNi with autosomal representatives! highly 

15 conserved in mammals and beyond- 

The new protein can find application in studying the expression 
profile of testis- and brain-specific genes and diagnosis/therapy 
of malfunctioning male fertility. 

20 

HRIHFB2Elb 

similarity to Y-linked Gene of Mus musculus 

Sequenced by BMFZ 

Locus: unknown 

30 Insert length: aan bp 

Poly A stretch at pos- BflDDi polyadenylation signal at pos- 277^ 



25 



1 AGGAGAGCTG GTTGCGTGAG 

35 SI GCGGCAGCGA CGCGGCTAAA 

101 TGTACGAACG CGGTCGCCAT 
1S1 GACCCGCCGC CTGAGCAGCT 
EDI CGCCGCCGCC GCCGCCGCTC 
ESI CGCCCGAGGC TCCAGGAGGA 

40 301 Gf^GGGGGGlG GGACTGGGCC 
3S1 TTCTCGAGGA GGGGGGGATC 
M01 CCCGGCTGGG ATTCTACCAT 
151 CACGGAGAGC CTGGAAGCAC 
SD1 TGGAAATCGA TTTTCAGGTT 

45 SSI GCCCTAGAAA CCTGTAGCGC 
L.01 CCCGAAGAGC AAGGAAGAGG 
LSI ATGAGCGGGA GAGTATGAGG 
701 AGGAAGCAGA GGAAGGTGAA 
7S1 GATGGAGAGC ATCCTGCAGG 

50 601 CAGTGAACAT CAAGGCAGGC 
flSl ATCCAGATGC GAAGACCCTT 
=101 TATCCCAGGC TTCTGGGTCA 
1S1 TTTTGATCAA CCGACGTGAT 
1001 CAGGTACAGG ATCTCAGACA 

55 1051 CTTCCAGACT AACCCCTACT 
1101 AGCGCAACCG CTCAGGCCGG 
11S1 CACCGGGGCC AGGAACCCCA 
1E01 CCACAGCTTT TTCAGCTGGT 



TCTCCTCAGC TCTGCTTACC GGTGCGACTA 
AGCGAAGGGG CGAGTGCGAG TCCCCTGAGC 
GGACCGCCCA GATGAGGGGC CTCCGGCCAA 
CCGAGTCTCC ACAGCGCGAC CCGCCCCCGC 
CTCCGACTGC CGCTGCCTCC ACCCCAGCAG 
AACGGAGGCG GCACAGGTGC TGGCCGATAT 
CCGCGCTGCC CCCGCCGCCT CCCTATGTCA 
CGCGCATACT TCACGCTC6G TGCTGAGTGT 
CGAGTCGGGG TATGGGGAGG CGCCCCCGCC 
TCCCCACTCC TGAGGCCTCG GGGGGGAGCC 
GTACAGTCGA GCAGTTTTGG TGGAGAGGGG 
AGTGGGGTGG GCGCCCCAGA GGTTAGTTGA 
CGATCATCAT AGTGGAGGAT GAGGATGAGG 
AGCAGCAGGA GGCGGCGGCG GCGGCGGAGG 
GAGGGAAAGC AGAGAGAGAA ATGCCGAGAG 
CACTGGAGGA TATTCAGCTG GATCTGGAGG 
AAAGCCTTCC TGCGTCTCAA GCGCAAGTTC 
CCTGGAGCGC AGAGACCTCA TCATCCAGCA 
AAGCATTCCT CAACCACCCC AGAATTTCAA 
GAAGACATTT TCCGCTACTT GACCAATCTG 
TATCTCCATG GGCTACAAAA TGAAGCTGTA 
TCACAAACAT GGTGATTGTC AAGGAGTTCC 
CTGGTGTCTC ACTCAACCCC AATCCGCTGG 
GGCCCGTCGT CACGGGAACC AGGATGCGAG 
TCTCAAACCA TAGCCTCCCA GAGGCTGACA 
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1251 GGATTGCTGA GATTATCAAG AATGATCTGT GGGTTAACCC TCTACGCTAC 

1301 TACCTGAGAG AAAGGGGCTC CAGGATAAAG AGAAAGAAGC AAGAAATGAA 

13S1 GAAACGTAAA ACCAGGGGCA GATGTGAGGT GGTGATCATG GAAGACGCCC 

mOl CTGACTATTA TGCAGTGGAA GACATTTTCA GCGAGATCTC AGACATTGAT 

5 mSl GAGACAATTC ATGACATCAA GATCTCTGAC TTCATGGAGA CCACCGACTA 

1501 CTTCGAGACC ACTGACAATG AGATAACTGA CATCAATGAG AACATCTGCG 

1551 ACAGCGAGAA TCCTGACCAC AATGAGGTCC CCAACAACGA GACCACTGAT 

IbOl AACAACGAGA GTGCTGATGA CCACGAAACC ACTGACAACA ATGAGAGTGC 

IbSl AGATGACAAC AACGAGAATC CTGAAGACAA TAACAAGAAC ACTGATGACA 

10 1701 ACGAAGAGAA CCCTAACAAC AACGA6AACA CTTACGGCAA CAACTTCTTC 

1751 AAAGGTGGCT TCTGGGGCAG CCATGGCAAC AACCAGGACA GCAGCGACAG 

IBOl TGACAATGAA GCAGATGAGG CCAGTGATGA TGAAGATAAT GATGGCAACG 

1851 AAGGTGACAA TGAGGGCAGT GATGATGATG GCAATGAA6G TGACAATGAA 

1101 GGCAGCGATG ATGACGACAG AGACATTGAG TACTATGAGA AAGTTATTGA 

15 nSl AGACTTTGAC AAGGATCAGG CTGACTACGA GGACGTGATA GAGATCATCT 

2001 CAGACGAATC AGTGGAAGAA 6AGGGCATTG AGGAAG6CAT CCAGCAAGAT 

2051 GAGGACATCT ATGAGGAAGG AAACTATGAG GAGGAAGGAA GTGAAGATGT 

2101 CTGGGAAGAA GGGGAAGATT CGGACGACTC TGACCTAGAG GATGTGCTTC 

2151 AGGTCCCAAA CGGTTGGGCC AATCCGGGGA AGAGGGGGAA AACCGGATAA 

20 2201 GGGTTTTCCC CTTTTGGGGA TCACCTCTCT GTATCCCCCA CCCACTATCC 

2251 CATTTGCCCT CCTCCTCAGC TAGGGCCACG CGGCCCCACA TTGCACTTCT 

2301 GGGGGGTGAC CGACTTCGTA CACGGGTTTA AAGTTTATTT TTATGGTTTA 

2351 GTCATTGCAG AGTTCTTATT TTGGGGGGAG GGAAAGGGGG CTAGTCCCCT 

2101 TCTTTTGGCC CTCCGCCCCC GCAGGCTTCT GTGTGCTGCT AACTGTATTT 

25 2451 ATTGTGATGC CTTGGTCAGG GCCCCTCTAC CCACTTCTCC CAGTCAGTTG 

2501 TGGCCCCAGC CCCTCTCCCT GTGCTGTGTG GAGTGGACAC CCTGACCCCC 

2551 GAAGCGGGGA GGGCCGCTGT GGCCTTCGTC ACAGCCGCGC AGTGCCCATG 

5b01 GAGGCGCTGC TGCCACCTTC CTCTCCCAAG TTCTTTCTCC ATCCCTCTCC 

2b51 TCTTCCCGCC GCGCCGCTAG CCCGCCTCGG TGTCTATGCA AGGCCGCTTC 

30 2701 GCCATTGCGG TATTCTTTGC GGTATTCTTG TCCCCGTCCC CCAGAAGGCT 

5751 CGCCTCTCCC CGTGGACCCT GTTAATCCCA ATAAAATTCT GAGCAAGTTT 
2601 AAAAAAAAAA AAAAAAAAA 



35 BLAST Results 

No BLAST result 



40 



50 



55 



Medline entries 



45 Vogel Ti Dittrich Oi Mehraein Y-» Dechend Fi Schnieders 
Schmidtke 

J.iMurine and human TSPYL genes: novel members of the 
TSPY-SET-NAP1L1 family. Cytogenet Cell Genet mfl^Al(3-4) :2bS-?0 



Peptide information for frame 2 



ORF from in bp to 2n7 bpi peptide length: b^3 
Category: similarity to known protein 
Classification: unclassified 
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1 HDRPDEGPPA KTRRLSSSES PGRDPPPPPP PPPLLRLPLP PP(3(2RPRL(2E 

51 ETEAAflVLAD MRGVGLGPAL PPPPPYVILE EGGIRAYFTL GAECPGUDST 

1D1 IESGYGEAPP PTESLEALPT PEASGGSLEI DF(3VV(3SSSF GGEGALETCS 

5 151 AVGUAPflRLV DPKSKEEAII IVEDEDEDER ESflRSSRRRR RRRRRKfiRKV 

501 KRESRERNAE RMESILfiALE DK2LDLEAVN IKAGKAFLRL KRKFIdMRRP 

251 FLERRDLIId HIPGFWVKAF LNHPRISILI NRRDEDIFRY LTNLflVUDLR 

301 HISMGYKHKL YFflTNPYFTN MVIVKEF<2RN RSGRLVSHST PIRUHRGdEP 

351 UARRHGNflDA SHSFFSUFSN HSLPEADRIA EIIKNDLUVN PLRYYLRERG 

10 HOI SRIKRKKCJEM KKRKTRGRCE VVIMEDAPDY YAVEDIFSEI SDIDETIHDI 

451 KISDFNETTD YFETTDNEIT DINENICDSE NPDHNEVPNN ETTDNNESAD 

501 DHETTDNNES ADDNNENPED NNKNTDDNEE NPNNNENTYG NNFFKGGFUG 

551 SHGNNflDSSD SDNEADEASD DEDNDGNEGD NEGSDDDGNE GDNEGSDDDD 

bOl RDIEYYEKVI EDFDKDdiADY EDVIEIISDE SVEEEGIEEG IflflDEDIYEE 

15 bSl GNYEEEGSED VUEEGEDSDD SDLEDVLflVP NGUANPGKRG KTG 



BLASTP hits 

20 

No BLASTP hits available 

Alert BLASTP hits for DKFZphamy2_7 j5i frame 2 

25 TREI"IBL:ABQ153>45_1 gene: "HRIHFBB21b"-i Homo sapiens HRIHFB221b 
mRNAi 

partial cds-i N = 4-. Score = 13T3-. P = 2-le-lbS 

TREI"IBL:HSDJ4flbI3_2 gene: "dJ4flbI3 • 2" V product : "dJ4flbI3.2 
30 (KIAA0721 

(NAP (Nucleosome Assembly Protein) domain containg protein) ) n ^ 
Human 

DNA sequence from clone 4BbI3 on chromosome bq22-l-22-3- Contains 
the 

35 part of a gene for a novel proteini the gene for KIAA0721 (NAP 

(Nucleosome Assembly Protein) domain containg protein) -i the TSPYL 
gene 

for TSPY-like (testis specific proteini Y-linked like)n and an 
RPS5 

40 (40S Ribosomal Protein S5) pseudogene. Contains ESTst STSsi GSSs 
and 

two putative CpG islands, i N = l-i Score = 570i P = 3-4e-55 

45 >TREI1BL:AB015345_1 gene: n HRIHFB221b"i Homo sapiens HRIHFB221b 
mRNAi 

partial cds* 

Length = 46b 

50 HSPs: 

Score = 13=13 (20T-0 bits)-. Expect = 2-le-lbS-. Sum P(4) = 2-le- 
lb5 

Identities = 2bfl/2 e lS (ID*) Positives = 2bfl/2^S (10*) 



55 



(2uery: 208 

NAERMESILflALEDIflLDLEAVNIKAGKAFLRLKRKFIflMRRPFLERRDLIIflHIPGFIiJV 2b? 
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NAERMESILdALEDIfiLDLEAVNIKAGKAFLRLKRKFIflflRRPFLERRDLIIflHIPGFWV 
Sbjct: 1 

NAERHESIL(3ALEI>I(2LPLEAVNIKA6ICAFLRLKRKFI(2nRRPFLERRI>LII(2HIPGFUV bO 

5 

(Query: Ebfi 

KAFLNHPRISILINRRDEDIFRYLTNL£3Vfll>LRHISM6YKI1KLYFl3TNPYFTNriVIVKEF 327 

KAFLNHPRISILINRR1)EI>IFRYLTNL(3V(2DLRHISMGYK[1KLYF(2TNPYFTNI1VIVKEF 
10 Sbjct: bl 

KAFLNHPRISILINRR»EI>IFRYLTNL(3V(3DLRHISI1GY<I1KLYFI3TNPYFTNMVIVKEF 120 
fluery: 32fl 

flRNRSGRLVSHSTPIRWHRGCiEPflARRHGNflDAXXXXXXXXXXXXLPEADRIAEIIKNDL 3fl7 
15 (2RNRSGRLVSHSTPIRWHRG<aEPl2ARRHGN<2DA 
LPEADRIAEIIKNDL 
Sbjct: 121 

(SRNRSGRLVSHSTPIRUHRGdEPflARRHGNfiDASHSFFSUFSNHSLPEADRIAEIIKNDL ISO 



20 <2uery: 3fifl 

UVNPLRYYLRERGSXXXXXXXXXXXXXXXGRCEVViriEDAPDYYAVEDIFSEISDIDETI 447 

UVNPLRYYLRER6S 
GRCEVVIHEDAPDYYAVEDIFSEISDIDETI 
Sbjct: lfll 

25 UVNPLRYYLRERGSRIKRKKflEMKKRKTRGRCEVVIHEDAPDYYAVEDIFSEISDIDETI 240 



fluery: 44fl 

HDKISDFMETTDYFETTDNEITDINENICDSENPDHNEVPNNETTDNNESADDH SD2 

30 HDIKISDFflETTDYFETTBNEITDINENICDSENPDHNEVPNNETTDNNESADDH 
Sbjct: 241 

HDIKISDFMETTDYFETTBNEITDINENICDSENPDHNEVPNNETT5NNESADDH 215 

Score = 117 (17. b bits)-. Expect = 1.0e-11i Sum P(4) = LQe-l") 
35 Identities = 32/77 (41*) i Positives = 44/77 (57*/.) 

fluery: 42b 

DAPDYYAVEDIFSEISDIDETIHDIKISDFHETTDYFETTDNEITDINENICDSENPDHN 4fl5 
+ DY+ I> +EI+DI+E ID E D+ E +NE TD NE+ 

40 D E D+N 

Sbjct: 250 ETTDYFETTD — NEITDINENICD 

SENPDHNEVPNNETTDNNESADDHETTDNN 301 

fluery: 4flb EVP— NNETT-DNNESADDH 502 
45 E NNE DNN++ DD + 

Sbjct: 302 ESADDNNENPEDNNKNTDDN 321 



50 



Score = 14 (14-1 bits)-. Expect = 2-le-lbS-. Sum P(4) = 2-le-lb5 
Identities = lb/lb (100*)-. Positives = lb/lb (100Z) 

fluery: b7fl flVPNGUANPGKRGKTG 1,13 

(2VPNGUANPGKRGKTG 
Sbjct: 471 flVPNGUANPGKRGKTG 4fib 

55 Score = TO (13-5 bits)-. Expect = l.^e-lb-. Sum P(4) = l-Te-lb 
Identities = 34/A5 (4DX) -. Positives = 45/65 (52*) 



-206- 



WO 01/98454 PCT/IB01/02050 

fiuery: 4Bb DAPDYYAVEDIFSEISDIDETIHDIKISDFHE TTDYFETTDN- 

EITDINENICDS 47^ 

+ I>Y+ ]> +EI+DI+E I D + D E TTD E+ I>+ E TD 

NE+ D + 

5 Sbjct: BSD ETTDYFETTD-- 

NEITDINENICDSENPDHNEVPNNETTDNNESADDHETTDNNESADDN 3D? 

fiuery: 4fiO -ENPDHN EVPNN-ETTDNN 41b 

ENP+ N E PNN E T N 

10 Sbjct: 30fl NENPEDNNKNTDDNEENPNNNENTYGN 334 

Score = fi? (13-1 bits)-, Expect = B-le-lbS-, Sum PC4) = B-le-lbS 
Identities = m/lH (100*)-, Positives = 14/14 (100X) 

15 fiuery: 543 FFKGGFUGSHGNNC? S5b 

FFKGGFUGSHGNNfl 
Sbjct: 33b FFKGGFUGSHGNNfl 3M=1 

Score = fl5 (IB-fl bits)-, Expect = B-le-lbS-, Sum PC4) = B-le-lbS 
20 Identities = Ib/lfl (flfi'/.)-. Positives = 17/16 (14*) 

(Suery: bDl RDIEYYEKVIEDFDKD(2A blfi 

RDIEYYEK IEDFD+DflA 
Sbjct: 3m RDIEYYEKGIEDFDRDtJA Mil 

Score = bO ("J.Q bits)-. Expect = 5.3e-03-, Sum P(4) = S-3e-D3 
Identities = Bl/bb (31"/.) -, Positives = 33/bb (SO*) 



25 



fluery: 4Bb DAPDYYAVEDIFSEISDIDETIHD-IKIS- 
30 DFMETTDYFETTDNEITDINENICDSENPD 4fl3 

D DY V +1 S+ S +E I + 1+ D E +Y E ++ + E+ 

DS+ D 

Sbjct: 401 

DflADYEDVIEIISDESVEEEGIEEGIflfiDEDIYEEGNYEEEGSEIWWEEGEDSDDSDLED 4bfi 

35 

fluery: 4fl4 HNEVPN Mfll 
+VPN 

Sbjct: 4b1 VU3VPN 474 

40 Score = 41 (7-4 bits)-, Expect = me-Ob-i Sum P(4) = 1.4e-0b 
Identities = 1B/3S (3M'/C)i Positives = Bl/35 (bO*) 

fluery: 4b3 ETTDNEITDINENICDSENPDHNEVPNNETTDNNE 417 
E +D+E D NE + + D NE +NE +D+++ 
45 Sbjct: 3bO EASDDEDNDGNEGDNEGSDDDGNE-GDNEGSDDDD 313 

Score = MB (b-3 bits)-, Expect = 7-Be-Ob-, Sum P(4) = 7-Be-Ob 
Identities = 11/37 (2TO i Positives = lfl/37 (4flX) 

50 Query: 4bS TDNEITDINENICDSENPDHNEVPNNETTDNNESADD SQ1 

+DNE + + J> E+ D NE N + ]>+ D + 
Sbjct: 3S4 SDNEADEAS -DDEDNDGNEGDNEGSDDDGNEGDN 3flb 

55 Pedant information for DKFZphamyB_7 jS-i frame B 
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ELENGTHJ fc. 6 ^ 

ElllilJ 7TM35 - 07 

5 Cpll 4. MS 

EH0I10LJ TREMBL : ABD153M5_1 gene: "HRIHFBEHlb" i Homo sapiens 

HRIHFB221b mRNAi partial cds- le-171 

EFUNCATJ Db-ID assembly of protein complexes ES. cerevisiae-i 

YKR046cJ Me-DS 

10 EFUNCATJ 03-52 cell cycle control and mitosis ES- cerevisiae-. 

YKRDMflcJ 4e-D5 

EFUNCATJ 03 . 0M budding-i cell polarity and filament formation 

ES- cerevisiae-i YKRDMflcJ Me-DS 

EFUNCATJ 01-13 biogenesis of chromosome structure ES- 
15 cerevisiaei YKRDMflcJ Me-DS 

EFUNCATJ 3Q.1D nuclear organization IS- cerevisiaei YKRDMflcJ 
Me-D5 

EBLOCKSJ BPOELMbH 

EBLOCKSJ BPD2b4bE 

20 EBLOCKSJ PFD0M2MA 

EBLOCKSJ BLD0M15N Synapsins proteins 

EBLOCKSJ BPD27nE 

EBLOCKSJ BLODDMfl Protamine PI proteins 

EBLOCKSJ PROOOMID 

25 EBLOCKSJ PFQDTSbD 

EBLOCKSJ PFDOISbC 

EBLOCKSJ PFDOISbB 

EPIRKUJ nucleus ae-33 

EPIRKUJ phosphoprotein fle-33 

30 EPIRKUJ alternative splicing fle-33 

EKUJ Alpha_Beta 

EKUJ L0U_C0MPLEXITY 35-35 V. 



35 SE<3 NDRPDEGPPAKTRRLSSSESPlJRDPPPPPPPPPLLRLPLPPPfltJRPRLflEETEAAtiJVLAD 

SEC xxxxxxxxxxxxxxxxxxxxxxxxxxxxx 

PRD ccccccccccccccccccccccccccccccccccccccccccchhhhhhhhhhhhhhhhh 

SEfi MRGVGLGPALPPPPPYVILEEGGIRAYFTLGAECPGUDSTIESGYGEAPPPTESLEALPT 

40 SEG • - - .xxxxxxxxxxx 

PRD ccccceeeeccccccccccccccceeeeccccccccccceeecccccccccchhhhhhhh 

SE<2 PEASGGSLEIDF(3VV<2SSSFGGEGALETCSAVGUAPl2RLVDPKSKEEAIIIVEDEDEDER 

SEG ■ xxxxxxxx 

45 PRD hcccccccccceeeeecccccchhhhhhhhccccccccccccchhhhhhhhhhhhhhhhh 

SE(3 ESriRSSRRRRRRRRRKflRKVKRESRERNAERMESILflALEDIflLDLEAVNIKAGKAFLRL 

SEG xxxxxxxxxxxxxxxxxxxxxxxxxxx 

PRD hhhhhhhhhhhhhhhhhhhhhhhhhccchhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhh 

50 

SE<2 KRKFI(3l1RRPFLERRDLIl£3HIPGFUv'KAFLNHPRISILINRRDEDIFRYLTNLflV*<2DLR 

SEG 

PRD hhhhhhhhhhhhhhhhhhhhhccccceeeccccccchhhhhccchhhhhhhhhhhhhhhc 

55 SE<2 HISriGYKnKLYF(2TNPYFTNMVIVKEFJ2RNRSGRLVSHSTPIRlilHRG(3EP(3ARRHGN(2DA 

SEG 

PRD cccccceeeeeeccccccchhhhhhhcccccccceeeccccccccccccccccccccccc 
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SEfl SHSFFSUFSNHSLPEADRIAEIIKNDLUVNPLRYYLRERGSRIKRKK<2Ef1KKRKTRGRC.E 

SEG xxxxxxxxxxxx xxxxxxxxxxxxxxx- • • • 

PRD cccceeeccccccccchhhhhhhhhhhhcccchhhhhhhhhhhhhhhcceeeeecccccc 

5 SEfl VVIMEDAPDYYAVEDIFSEISDIDETIHDIKISDFPIETTDYFETTDNEITDINENICDSE 

SEG 

PRD eeeccccccceeehhhhhhhhhhccccccceeeeeccccccccccchhhhhhhhcccccc 

SEC NPDHNEVPNNETTDNNESADDHETTDNNESADDNNENPEDNNKNTDDNEENPNNNENTYG 

10 SEG xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx 

PRD ccccccceeeecccccccccccccccccchhhhhcccccceeeeeccccccccccccccc 

SEfl NNFFKGGFUGSHGNNflDSSDSDNEADEASDDEDNDGNEGDNEGSDDDGNEGDNEGSDDDD 

SEG XX xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx 

15 PRD cccccccccccccccccccccccccccccccccccccccccccccccccccccccccccc 

SE(3 RDIEYYEKVIEDFDKDflADYEDVIEIISDESVEEEGIEEGIflflDEDIYEEGNYEEEGSED 

SEG xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx 

PRD cchhhhhhhhhhhcccccchhhhheeeecccccccccccccccccceeecccccccccce 

20 

SEfl VUEEGEDSDDSDLEDVLGVPNGlilANPGKRGKTG 

SEG xxxxxxxxxxxxxxxxx 

PRD eeecccccccccceeeeeccccccccccccccc 

25 

(No Prosite data available for DKFZphamy2_7j5.2) 
(No Pfam data available for DKFZphamyE_? j5 . S) 

30 

Pedant information for DKFZphamy2_7 jSi frame 3 
Report for DKFZphamy2_7 jS . 3 

35 

([LENGTH] 150 

Epii 12. aa 

40 EBL0CKS3 PR0030aA 
EKIO All_Alpha 
EKUl LOIil_COMPLEXITY bl-33 V. 



45 SEfl HRTSATARILTTMRSPTTRPLITTRVLf1TTKPLTTnRVl2riTTTRILKTITRTLf1TTKRTL 

SEG xxxxxxxxxxxxxxxxxxxxx 

PRD ccchhhhhhhhhhcccccccccceeeeccccccchhhhhhhhhhhhhhhhhhhccccccc 

SEC TTTRTLTATTSSKVASGAAnATTRTAATVTPIKflriRPVririKiririATKVTriRAVritlllAIIKVT 

50 SEG xxxxxxxxxx xxxxxxxxxxxxx xxxxxxxxxxxxxxxxxxx 

PRD ccccceeecccccccchhhhhhhhhhhhhhhhhchhhhhhhhhhhhhhhhhhhhhhhhhh 

SEfl MKAAMIITTETLSTflRKLLKTLTRIRLTTRT 

SEG xxxxxxxx- xxxxxxxxxxxxxxxxxxxxx 

55 PRD hhhhhhhhhhhhhhhhhhhhhhhhhhhccc 



(No Prosite data available for DKFZphamy2_7jS-3) 
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(No Pfam data available for DKFZphamy2_7 j5-3) 

DKFZphfbr2_7ficl2 
5 ---- 



group: nucleic acid management 

10 DKFZphf br2_7flcl2 encodes a novel 52fi amino acid protein with high 
csimilarity to glutamyl-tRNA (Gin) amidotransf erase subunit A of 
the hyperthermophilic bacterium Aquifex aeolicus. 

The novel protein contains one ATP/GTP-binding site motif A (P- 
15 loop). This loop interacts with one of the phosphate groups of a 
A or G nucleotide. It is found in numerous ATP- or GTP-binding 
proteinsi such as ATP synthase alpha and beta subunits-i Myosin 
heavy chainsi Kinesin heavy chains and kinesin-like proteinsi 
Dynamins and dynamin-like proteins^ several kinases-i DNA and RNA 
20 helicasesi GTP-binding elongation factors and the Ras family of 
GTP-binding proteins. The protein seems to be expressed 
ubiquitously . 

The new protein can find application in the modulation of 
25 translational pathways. 



similarity to glutamyl-tRNA (Gin) amidotransf erase subunit A 
(Aquifex 
30 aeolicus) 

Sequenced by NediGenomix 

Locus: /map= T, bflfci-3 cR from top of Chrb linkage group" 

35 

Insert length: 35MM bp 

Poly A stretch at pos- 3222i polyadenylation signal at pos- 3E04 



40 1 AGTGACAATT AAAGATGGCT GCGCCCATGT AACATCACTA GCGACCGGTG 

51 ACCTCTTTTT CCCCCTTGCC TGGCTCCTGT GGTGGCAGGC TGGGCACGAG 

101 GACCATGCTG GGCCGGAGCC TCCGAGAAGT TTCTGCGGCA CTGAAACAAG 

1S1 GCCAAATTAC ACCAACAGAG CTCTGTCAAA AATGTCTCTC TCTTATCAAG 

201 AAGGCCAAGT TTCTAAATGC CTACATTACT GTGTCAGAAG AGGTGGCCTT 

45 251 AAAACAAGCT GAAGAATCAG AAAAGAGATA TAAGAATGGA CAGTCACTTG 

301 GGGATTTAGA TGGAATTCCT ATTGCAGTAA AAGACAATTT CAGCACTTCT 

351 GGCATTGAGA CAACATGTGC ATCAAATATG CTGAAAGGTT ATATACCACC 

mil TTATAATGCT ACAGTAGTTC AGAAGTTGTT GGATCAGGGA GCTCTACTAA 

**S1 TGGGAAAAAC AAATTTAGAT GAGTTTGCTA TGGGATCTGG GAGCACAGAT 

50 501 GGTGTATTTG GACCAGTTAA AAACCCCTGG AGTTATTCAA AACGATATAG 

551 AGAAAAGAGG AAGCAGAATC CCCACAGCGA GAATGAAGAT TCAGACTGGC 

bOl TGATAACTGG AGGAAGCCCA GGTGGGAGTG CAGCTGCTGT ATCGGCGTTC 

L.51 ACATGCTACG CGGCTTTAGG ATCAGATACA GGAGGATCGA CCAGAAATCC 

701 TGCTGCCCAC TGTGGGCTTG TTGGTTTCAA ACCAAGCTAT GGCTTAGTTT 

55 751 CCCGTCATGG TCTCATTCCC CTGGTGAATT CGATGGATGT GCCAGGAATC 

flOl TTAACCAGAT GTGTGGATGA TGCAGCAATT GTGTTGGGTG CACTGGCCGG 

flSl ACCTGACCCC AGGGACTCTA CCACAGTACA TGAACCTATT AATAAACCAT 

"101 TCATGCTTCC CAGTTTGGCA GATGTGAGCA AACTATGTAT AGGAATTCCA 
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151 AAGGAATATC TTGTACCGGA ATTATCAAGT GAAGTACAGT CTCTTTGGTC 

10D1 CAAAGCTGCT GACCTCTTTG AGTCTGAGGG GGCCAAAGTA ATTGAAGTAT 

1051 CCCTTCCTCA CACCAGTTAT TCAATTGTCT GCTACCATGT ATTGTGCACA 

1101 TCAGAAGTGG CATCGAATAT GGCAAGATTT GATGGGCTAC AATATGGTCA 

5 1151 CAGATGTGAC ATTGATGTGT CCACTGAAGC CATGTATGCT GCAACCAGAC 

1201 GAGAAGGATT TAATGATGTG GTGAGAGGAA GAATTCTCTC AGGAAACTTT 

1ES1 TTCTTATTAA AAGAAAACTA TGAAAATTAT TTTGTCAAAG CACAGAAAGT 

1301 GAGACGCCTC ATTGCTAATG ACTTTGTAAA TGCTTTTAAC TCTGGAGTAG 

1351 ATGTCTTGCT AACTCCCACC ACCTTGAGTG AGGCAGTACC ATACTTGGAG 

10 1M01 TTCATCAAAG AGGACAACAG AACCCGAAGT GCCCAGGATG ATATTTTTAC 

1M51 ACAAGCTGTA AATATGGCAG GATTGCCAGC AGTGAGTATC CCTGTTGCAC 

1501 TCTCAAACCA AGGGTTGCCA ATAGGACTGC AGTTTATTGG ACGTGCGTTT 

1551 TGTGACCAGC AGCTTCTTAC AGTAGCCAAA TGGTTTGAAA AACAAGTACA 

IbOl GTTTCCTGTT ATTCAACTTC AAGAACTCAT GGATGATTGT TCAGCAGTCC 

15 IbSl TTGAAAATGA AAAGTTAGCC TCTGTCTCTC TAAAACAGTA AACATATCTT 

1701 ACAAATTAAA ATGACTTTTA GGCTGGGTGC AGTGGCTCAC ACCTGTAATC 

1751 CCAGCACTTT GGGAGGCCAA GGCGAGCGGA TCATGAGGTC AGAAGATCTA 

IflOl GAACAGCCTG GTCAACATGG TGAAACCCCG TCTCTACTAA AAATACAAAA 

1351 ATTAGCCAGG CTTAGTGGCG GGCATCTGTA GTCCCAGCTA CTCAGGAGGC 

20 nOl TGAGGCAGGA GAATCACTTG AACCCTGGAG GTGGAGGTTG CAGTGAGCCG 

1151 AGATCATGCC ACTGCACTGC ACTCCAGCCT GGGTGACAAA GCAAGACTGT 

E001 GTCTCAAAAT AAATAAATAA AATAAAATAA AATGACGTAC AGAGATTCTA 

E051 TATTCTAGAG AGTCAAATGG TCTTGCTCAA TTCTTGTAAT TAGGTTCTTG 

2101 TTAATACAGT CATTCCATGG AATTACTTTT TAAAATTCCT GTGACAATTA 

25 2151 ATAATAAATA ACGTGTCAGC ATTTAGTAAG CATCCACTAA GTGTACAATA 

2201 CTTCTACAAT AACACAAGAT ACCTGTTCCT CAAAGACAAT GCATTCTGCC 

2251 ATAATGTTCA TTAAAGAGTT TACAGTAAAA ATAAGATTAG GGATAAACTT 

2301 CTCAAAAATT GTACATCTGT GTAACTAAAG CACTAACAAA AACATGAATA 

2351 GTCCTTCTAG AGGTAACTTG GATAGCCTAG GCAGGCAACT TATCATGTGG 

30 Ei»01 TGAAGGCCGC CTCAGGGGTT GTTAAAAATG CACAGAAACA ATTGAGTGCG 

2>451 ATTATTGGCT TCTGAGCGCT GAGCAGAGCA GGTGGAAGAG GAACTTT6AG 

2501 CACAGGAGGA AATGCAACCA GTCAGGGCCC AGAATCATGC AAATCTCAGG 

2551 GGTATGCCTC TCTGGGGAGG AGCTCCACTT GCAGGGACTC CTTTTATTTC 

2b01 CCTAAGAAAG AGCTGAAATG ACTGAGAACT TTCCTTTCCT CCTTAGAGTT 

35 2b51 ACAATTTTAC TTCTGCTATT CCGGAGCCCA TGCCTAGAAG CCAGAACAAC 

2701 TCCATGTTAC ACTGAGTTCA TGCTCCTATT TACTGATCAC AAATGAGCTC 

2751 ATTAATGTCA TCGAAACATT TATTGTAACC TAACAGACCA TCACAGATTG 

2A01 GAAACTTGGT AGATAGCAGA GCATGGTATT AGTGAAAAAG GTTCAAAATA 

2A51 CACAAGTAAC ATACACTCTG AAAAACATGC AGATAATTTG CTGATGAAGC 

40 2101 AGAAGAGGGG ATGCGCATGG CAAGAACTTG CCTTACCCCA GATTCTCTAT 

2151 ATCTCATGGT TTCCTTTTCC TCTTGACTGT CTTTACGAGT GTTTTTTATT 

3001 TGGGACCCTC GAGCCCAGAG ATATTAATGG ATATCTGTAT TCAATATTTG 

3051 ACAAAATCTA ATGGAAACCA TCCATTTACT CATGATAAGG CTTCATCACT 

3101 GGATTTCTGT GTCTTCACTA GAACACCATT GTCATCTCAT ATTGATCAGG 

45 3151 TATTTTAATC TAGCACTTAC ATATTGTTGA TAAATGAAAG CTGAATTGTT 

3201 ACTTAATAAA TTCACTTTGT TTAGCAAAAA AAAAAAAAAA AAAA 



BLAST Results 

50 - 

No BLAST result 



55 Medline entries 



No Medline entry 
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Peptide information for frame 3 



ORF from IDS bp to Ibflfl bpi peptide length: 52fl 
Category: similarity to known protein 
Classification: Protein management 
10 Prosite motifs: ATP_GTP_A (112-11T) 



1 HLGRSLREVS AALK<2G<3ITP TELCC3KCLSL IKKAKFLNAY ITVSEEVALK 

SI (2AEESEKRYK NGdSLGDLDG IPIAVKDNFS TSGIETTCAS NULKGYIPPY 

15 101 NATVVtJKLLD iJGALLMGKTN LDEFAMGSGS TDGVFGPVKN PUSYSKRYRE 

151 KRKflNPHSEN EDSDULITGG SPGGSAAAVS AFTCYAALGS DTGGSTRNPA 

SD1 AHCGLVGFKP SYGLVSRHGL IPLVNSMDVP GILTRCVDDA AIVLGALAGP 

251 DPRDSTTVHE PINKPFMLPS LADVSKLCIG IPKEYLVPEL SSEVjJSLWSK 

301 AADLFESEGA KVIEVSLPHT SYSIVCYHVL CTSEVASNflA RFDGLfiYGHR 

20 3S1 CDIDVSTEAM YAATRREGFN DVVRGRILSG NFFLLKENYE NYFVKAdKVR 

M01 RLIANDFVNA FNSGVDVLLT PTTLSEAVPY LEFIKEDNRT RSAflDDIFTfl 

MSI AVNflAGLPAV SIPVALSNC2G LPIGLflFIGR AFCD(2<2LLTV AKIi)FEK<2Vl2F 

SOI PVI(2L<2ELI1D DCSAVLENEK LASVSLKfl 



25 



30 



BLASTP hits 

No BLASTP hits available 

Alert BLASTP hits for DKFZphf br2_7flcl2 i frame 3 



PIR:F70322 glutamyl-tRNA (Gin) amidotransf erase subunit A - 
Aquif ex 

35 aeolicus-. N = 2-, Score = L.20-. P = 1.3e-AT 



>PIR:F70322 glutamyl-tRNA (Gin) amidotransf erase subunit A - 
Aquif ex 
40 aeolicus 

Length = M7fi 

HSPs: 

45 Score = b20 (13-0 bits)-. Expect = H-Se-flT-. Sum P(2) = M.3e-flT 
Identities = 135/311 (M2JO-. Positives = nS/311 (bl*) 

fluery: 167 

ALGSDTGGSTRNPAAHCGLVGFKPSYGLVSRHGLIPLVNSIIDVPGILTRCVDLAAIVLGA 2Mb 
50 +LGSDTGGS R PA+ CG++G KP+YG VSR+GL+ +S+D G+ R 

+ D A+VL 
Sbjct: 1L3 

SLGSDTGGSIRflPASFCGVIGIKPTYGRVSRYGLVAFASSLDfllGVFGRRTEDVALVLEV 222 

55 Query: EH7 

LAGPDPRDSTTVHEPINKPFriLPSLADVSKLCIGIPKEYLVPELSSEVflSLlilSKAABLFE 3Db 
++G D +DST+ P+ + + +V L IG+PKE+ EL +V+ + 

E 
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Sbjct: EE3 ISGUDEKDSTSAKVPVPE- 
li)SEEVKKEVKGLKIGLPKEFFEYEL<3Pl2VKEAFENFIKELE Sfll 

Query: 3D? 

5 SEGAKVIEVSLPHTSYSIVCYHVLCTSEVASNMARFDGLflYGHRCDIDVSTEAIIYAATRR 3bb 

EG ++ EVSLPH YSI Y+++ SE +SN+AR+DG++YG+R 

HYA TR 

Sbjct: aas 

KEGFEIKEVSLPHVKYSIPTYYIIAPSEASSNLARYDGVRYGYRAKEYKDIFENYARTRD 3M1 

10 

fluery: 3b7 

EGFNDVVRGRILSGNFFLLKENYENYFVKAtfKVRRLIANDFVNAFNSGVDVLLTPTTLSE M2b 
EGF V+ RI+ G F L Y+ Y++KAOKVRRLI NDF+ AF VDV+ 

+PTT 

15 Sbjct: 3ME EGFGPEVKRRIMLGTFALSAGYYDAYYLKAfiKVRRLITNDFLKAFEE- 
VDVIASPTT— P 3«!fl 

fluery: 157 

AVPYLEFIKEDNRTRSAflDDIFTflAVNMAGLPAVSIPVALSNflGLPIGLdFIGRAFCDflC? Mfib 
20 +P+ + +N ]>I T N+AGLPA+SIP+A + GLP+G d 

IG+ + + 

Sbjct: 3"H TLPFKFGERLENPIEflYLSDILTVPANLAGLPAISIPIAUKD- 
GLPVGGflLIGKHIilDETT MS? 

25 fluery: Mfl7 LLTVAK-UFEK(JV(2FPVIj3L 5D5 

LL ++ U +K + I L 
Sbjct: MSA LL(2ISYLUE<2KFKHYEKIPL M77 

Score = Efll (M3-M bits)-. Expect = M ■ 3e-fl c l-» Sum P(E) = M.3e-fl1 
30 Identities = bM/lM3 (MMX)-, Positives = T0V1M3 (bE*) 

fluery: M RSLREVSAALKflGfllTPTELCflKCLSLIKKAKF- 

LNAYITVSEEVALKflAEESEKRYKNG bE 

+SL E+ LK+G+++P E+ + + + + AYIT ALKflAE 

35 ++R 

Sbjct: 5 

KSLSELRELLKRGEVSPKEVVESFYDRYNflTEEKVKAYITPLYGKALKflAESLKER bO 

(3uery: b3 

40 (3SLGDLl>GIPIAVKDNFSTSGIETTCASNnLKGYIPPYNATVV(3KLLI>(3GALLI1GKTNLI> 1EE 

L L GIPIAVKDN G +TTCAS +L+ ++ PY+ATV+++L 

GAL++GKTNLD 
Sbjct: bl -EL- 

PLFGIPIAVKDNILVEGEKTTCASKILENFVAPYDATVIERLKKAGALIVGKTNLD 118 

45 

(2uery: 1E3 EFAMGSGSTDGVFGPVKNPWSYSK 1Mb 

EFAMGS + F P KNPUI + 
Sbjct: in EFAUGSSTEYSAFFPTKNPWDLER 1M2 

50 

Pedant information for DKFZphf brE_?flclEi frame 3 



Report for DKFZphf brE_7ficl2- 3 

55 



ELENGTHJ SS8 
EflliD S7Mba.7fi 



-213- 



WO 01/98454 PCT/1B0 1/02050 

EpI3 S-S? 

CHOriOLU PIR:E7172S glutamyl-tRNA amidotransf erase chain A 

(gatA) RP1S2 - Rickettsia prowazekii 2e-13 

EFUNCAT3 r general function prediction EM. jannaschii-. 

5 nJllbCU fle-bl 

E"FUNCAT3 D1.QS.D1 nitrogen and sulphur utilization ES. 
cerevisiae-. Yf1R213c3 le-SS 

EFUNCAT3 c energy conversion EH. genitalium-i 1160113 LJe-MI 

EFUNCAT3 01. 01. ID amino-acid degradation ES. cerevisiae-> 
10 YBR20fic3 Ee-31 

EFUNCAT3 01-03.01 purine-ribonucleotide metabolism ES. 
cerevisiae! YBRSOflcl Ee-31 
EBL0CKS3 BL00S71 

EECJ b-3.M.b Urea carboxylase Se-30 

15 EEC! 3-S-m Amidase 3e-31 

EEC! 3.5.2-12 b-Aminohexanoate-cyclic-dimer hydrolase le-17 

EPIRKliD ligase Se-30 

EPIRKIO transmembrane protein Se-30 

EPIRKIO ATP Se-30 

20 EPIRKIO crown gall tumor le-21 

EPIRKIO mitochondrion 2e-13 

EPIRKIO purine nucleotide binding Se-30 

EPIRKIO P-loop Se-30 

EPIRKIO hydrolase 3e-31 

25 EPIRKIO biotin Se-30 

ESUPFAM3 amidase 3e-31 

ESUPFAH3 biotin carboxylase homology Se-30 
ESUPFAM3 indoleacetamide hydrolase 7e-12 
ESUPFAH3 lipoyl/biotin-binding homology Se-30 
30 EPR0SITE3 ATP_GTP_A 1 
EKU3 Alpha_Beta 
EKU3 LOU_C0MPLEXITY 2. Mb '/. 

35 SEfi NLGRSLREVSAALKdGfilTPTELCflKCLSLIKKAKFLNAYITVSEEVALKflAEESEKRYK 

SEG 

PRD ccchhhhhhhhhhhhhcccchhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhh 

SE<3 NGflSLGDLDGIPIAVKDNFSTSGIETTCASNnLKGYIPPYNATVVflKLLDdGALLPIGKTN 
40 SEG 

PRD hccccccccccceeeecccccccccccchhhhhhhcccccchhhhhhhhhccceeeeccc 

SE<3 LDEFAUGSGSTDGVFGPVKNPIilSYSKRYREKRKdNPHSENEDSDIiJLITGGSPGGSAAAVS 

SEG xxxxxxxxxxxx 

45 PRD ccccccccccccccccccccccccceeecccccccccccccccccccccccccccccchh 

SEfl AFTCYAALGSDTGGSTRNPAAHCGLVGFKPSYGLVSRHGLIPLVNSNDVPGILTRCVDDA 
SEG x 

PRD hhhheeeecccccccccccccceeeecccccceeeeccceeeeecccccccccchhhhhh 

50 

SEC AIVLGALAGPDPRDSTTVHEPINKPFMLPSLADVSKLCIGIPKEYLVPELSSEVflSLWSK 

SEG 

PRD hhhhhhhccccccccccccccccccccccccccccceeeecccccccccchhhhhhhhhh 

55 SE(3 AADLFESEGAKVIEVSLPHTSYSIVCYHVLCTSEVASNMARFDGL(3YGHRCDIDVSTEAri 

SEG 

PRD hhhhhhhhcceeeeeeccccceeeeeeeeehhhhhhhhhhhhhcccceeeccchhhhhhh 
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SE(3 YAATRREGFNDVVRGRILSGNFFLLKENYENYFVKAflKVRRLIANDFVNAFNSGVDVLLT 

SEG 

PRD hhhhhhcccchhhhhhhhhhheeeccccchhhhhhhhhhhhhhhhhhhhhhhhheeeeee 

5 SE<3 PTTLSEAVPYLEFIKEDNRTRSAfiDDIFTflAVNHAGLPAVSIPVALSNflGLPIGLcSFIGR 

SEG 

PRD cccccccccccccccccccccccccceeeeccccccccccccccccccccccceeeeeec 

SEfl AFCDflflLLTVAKUFEKflVflFPVIfiLflELflDDCSAVLENEKLASVSLKfl 

10 SEG 

PRD cccchhhhhhhhhhhhhhhhhheeehhhhhhheeeeccccceeeeccc 



15 Prosite for DKFZphf br2_7flcl2 -3 

PS00017 112->120 ATP_GTP_A PD0C00017 

20 (No Pfam data available for DKFZphf br2_7flcl2 • 3 ) 
DKFZphfbrS_7fldia 



25 



45 



group: brain derived 



DKFZphf brS_7fldlfi encodes a novel S35 amino acid protein with weak 
similarity to a human putative mitogen-activated protein kinase 
30 kinase kinase- 

No informative BLAST resultsi No predictive prositei pfam or SCOP 
motif e • 

35 The new protein can find application in studying the expression 
profile of brain-specific genes- 

similarity to putative mitogen-activated protein kinase kinase 
kinase 
40 (Homo sapiens) 

Sequenced by PlediGenomix 

Locus: unknown 



Insert length: BISfl bp 

Poly A stretch at pos- 213fli polyadenylation signal at pos- 5117 



50 1 ATCCGGGGCC CCGGAACCCG AGCTGGAGCT GAAGCGCAGG CTGCGGGGCG 

51 CGGAGTCGGG AGTGCAGGCC TGAGTGTTCC TTCCAGCATG TCGGAGGGGG 

101 AGTCCCAGAC AGTACTTAGC AGTGGCTCAG ACCCAAAGGT AGAATCCTCA 

151 TCTTCAGCCC CTGGCCTGAC ATCAGTGTCA CCTCCTGTGA CCTCCACAAC 

SQ1 CTCAGCTGCT TCCCCAGAGG AAGAAGAAGA AAGTGAAGAT GAGTCTGAGA 

55 251 TTTTGGAAGA GTC6CCCTGT GGGCGCTGGC AGAAGAGGCG AGAAGAGGTG 

301 AATCAACGGA ATGTACCAGG TATTGACAGT GCATACCTGG CCATGGATAC 

351 AGAGGAAGGT GTAGAGGTTG TGTGGAATGA GGTACAGTTC TCTGAACGCA 

MQ1 AGAACTACAA GCTGCAGGAG GAAAAGGTTC GTGCTGTGTT TGATAATCTG 
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MSI ATTCAATTGG AGCATCTTAA CATTGTTAAG TTTCACAAAT ATTGGGCTGA 

5D1 CATTAAAGAG AACAAGGCCA GGGTCATTTT TATCACAGAA TACATGTCAT 

551 CTGGG AGTCT GAAGCAATTT CTGAAGAAGA CCAAAAAGAA CCACAAGACG 

bOl ATGAATGAAA AGGCATGGAA GCGTTGGTGC ACACAAATCC TCTCTGCCCT 

5 L.51 AAGCTACCTG CACTCCTGTG ACCCCCCCAT CATCCATGGG AACCTGACCT 

701 GTGACACCAT CTTCATCCAG CACAACGGAC TCATCAAGAT TGGCTCTGTG 

751 GCTCCTGACA CTATCAACAA TCATGTGAAG ACTTGTCGAG AAGAGCAGAA 

flOl GAATCTACAC TTCTTTGCAC CAGAGTATGG AGAAGTCACT AATGTGACAA 

fi51 CAGCAGTGGA CATCTACTCC TTTGGCATGT GTGCACTGGA GATGGCAGTG 

10 ^01 CTGGAGATTC AGGGCAATGG AGAGTCCTCA TATGTGCCAC AGGAAGCCAT 

=151 CAGCAGTGCC ATCCAGCTTC TAGAAGACCC ATTACAGAGG GAGTTCATTC 

10D1 AAAAGTGCCT GCAGTCTGAG CCTGCTCGCA GACCAACAGC CAGAGAACTC 

1051 CTGTTCCACC CAGCATTGTT TGAAGTGCCC TCGCTCAAAC TCCTTGCGGC 

11D1 CCACTGCATT GTGGGACACC AACACATGAT CCCAGAGAAC GCTCTAGAGG 

15 1151 AGATCACCAA AAACATGGAT ACTAGTGCCG TACTGGCTGA AATCCCTGCA 

1ED1 GGACCAGGAA GAGAACCAGT TCAGACTTTG TACTCTCAGT CACCAGCTCT 

1251 GGAATTAGAT AAATTCCTTG AAGATGTCAG GAATGGGATC TATCCTCTGA 

1301 CAGCCTTTGG GCTGCCTCGG CCCCAGCAGC CACAGCAGGA GGAGGTGACA 

1351 TCACCTGTCG TGCCCCCCTC TGTCAAGACT CCGACACCTG AACCAGCTGA 

20 mOl GGTGGAGACT CGCAAGGTGG TGCTGATGCA GTGCAACATT GAGTCGGTGG 

1M51 AGGAGGGAGT CAAACACCAC CTGACACTTC TGCTGAAGTT GGAGGACAAA 

1501 CTGAACCGGC ACCTGAGCTG TGACCTGATG CCAAATGAGA ATATCCCCGA 

1551 GTTGGCGGCT GAGCTGGTGC AGCTGGGCTT CATTAGTGAG GCTGACCAGA 

lfc.01 GCCGGTTGAC TTCTCTGCTA GAAGAGACCT TGAACAAGTT CAATTTTGCC 

25 lbSl AGGAACAGTA CCCTCAACTC AGCCGCTGTC ACCGTCTCCT CTTAGAGCTC 

1701 ACTCGGGCCA GGCCCTGATC TGCGCTGTGG CTGTCCCTGG ACGTGCTGCA 

1751 GCCCTCCTGT CCCTTCCCCC CAGTCAGTAT TACCCTGTGA AGCCCCTTCC 

IflOl CTCCTTTATT ATTCAGGAGG GCTGGGGGGG CTCCCTGGTT CTGAGCATCA 

1651 TCCTTTCCCC TCCCCTCTCT TCCTCCCCTC TGCACTTTGT TTACTTGTTT 

30 MOl TGCACAGACG TGGGCCTGGG CCTTCTCAGC AGCCGCCTTC TAGTTGGGGG 

MSI CTAGTCGCTG ATCTGCCGGC TCCCGCCCAG CCTGTGTGGA AAGGAGGCCC 

2D01 ACGGGCACTA GGGGAGCCGA ATTCTACAAT CCCGCTGGGG CGGCCGGGGC 

2051 GGGAGAGAAA GGTGGTGCTG CAGTGGTGGC CCTGGGGGGC CATTCGATTC 

2101 GCCTCAGTTG CTGCTGTAAT AAAAGTCTAC TTTTTGCCAA AAAAAAAAAA 
35 2151 AAAAAAAA 



BLAST Results 



40 

No BLAST result 



Medline entries 

45 

No Medline entry 



50 

Peptide information for frame 1 



ORF from flfl bp to 11^2 bpn peptide length: 535 
55 Category: similarity to unknown protein 
Classification: Protein management 

1 MSEGESflTVL SSGSDPKVES SSSAPGLTSV SPPVTSTTSA ASPEEEEESE 
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51 DESEILEESP CGRWflKRREE VNtiRNVPGID SAYLAMDTEE GVEVVUNEVfl 

101 FSERKNYKL<2 EEKVRAVFDN LIflLEHLNIV KFHKYIiJADIK ENKARVIFIT 

151 EY!1SSGSLK<2 FLKKTKKNHK TNNEKAUKRIil CTfllLSALSY LHSCDPPIIH 

EDI GNLTCDTIFI (2HNGLIKIGS VAPDTINNHV KTCREECJKNL HFFAPEYGEV 

5 B51 TNVTTAVDIY SFGMCALEHA VLEIflGNGES SYVP<2EAISS AlfiLLEDPLfi 

301 REFI<2KCL<2S EPARRPTARE LLFHPALFEV PSLKLLAAHC IVGHflHUIPE 

351 NALEEITKNH DTSAVLAEIP AGPGREPVflT LYSflSPALEL DKFLEDVRNG 

M01 IYPLTAFGLP RP(2<2P(2<3EEV TSPVVPPSVK TPTPEPAEVE TRKVVLMflCN 

MSI IESVEEGVKH HLTLLLKLED KLNRHLSCDL flPNENIPELA AELViSLGFIS 

10 501 EADflSRLTSL LEETLNKFNF ARNSTLNSAA VTVSS 



BLASTP hits 

15 

No BLASTP hits available 

Alert BLASTP hits for DKFZphf br2_7fldlfi-i frame 1 

20 TREMBL:AC00SMb.5_m gene: "TTJm-lM"; product: "putative mitogen 
activated protein kinase kinase"^ Arabidopsis thaliana 
chromosome III 

BAC TUlt genomic sequence! complete sequence- 1 N = li Score = 
372-. P = 
25 l-^e-33 

TREPIBL : AFm5bT0_l gene: "BcDNA • LD2flb57"i product: 
"BcDNA. LD2flb57"i 

Drosophila melanogaster clone LD2flbS7 BcDNA • LD2flb57 
30 ( BcDNA • LD2flb57) 

mRNA-. complete cds-i N = l-i Score = imOi P = 1.3e-115 

PIR:T02151 probable mitogen activated protein kinase - ricei N s 
1-. 

35 Score = 311-, P = me-3S 



>TREnBL:AFlM5b c 50_l gene: "BcDNA • LD2flb5?"i product: 
"BcDNA -LDSflb 57" i 
40 Drosophila melanogaster clone LD2flb57 BcDNA • LD2flb57 

(BcDNA. LD2flb57) mRNAi 
complete cds. 

Length = b37 



45 HSPs: 



Score = HMD (171-0 bits)-. Expect = 1.3e-115-. P = 1.3e-115 
Identities = 230/lbS CH*)-. Positives = 30M/4b5 (b5*) 

50 fluery: bl 

CGRU<2KRREEVN<2RNVPGIDSAYLAMDTEEGVEVVli)NEV<2FSERKNYKL<2EEKVRAVFDN 120 
CGRU KRREEV+fiR+VPGID +LAMDTEEGVEVVUNEV<2++ + K 

(JEEK+R VFDN 
Sbjct: 102 

55 CGRULKRREEVDl3RDVPGIDCVHLAHDTEEGVEVVUNEVl2YASLflELKS(2EEKI1R(3VFDN Ibl 

fiuery: 121 LlflLEHLNIVKFHKYUADIKE- 
NKARVIFITEYflSSGSLKfiFLKKTKKNHKTIINEKAIilKR 171 
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L+(2L+H NIVKFH+Yld D ++ + RV+FITEYMSSGSLKflFLK+TK+N K 

+ ++U+R 
Sbjct: lb2 

LL(2LDH(JNIVKFHRYlJT]>T(3(JAERPRVVFITEYriSS6SLK(3FLKRTKRNAKRLPLESIiJRR 221 

5 

fluery: 160 

UCTfllLSALSYLHSCDPPIIHGNLTCDTIFIflHNGLIKIGSVAPDTINNHVKTCREEOKN 231 
UCTQILSALSYLHSC PPIIHGNLTCD+IFIflHNGL+KIGSV PD ++ V+ 

RE ++ 
10 Sbjct: E2E 

IJCTlJILSALSYLHSCSPPIIHGNLTCDSIFIflHNGLVKIGSVVPDAVHYSVRRGRERERE 2fll 

fiuery: 210 LHFF-APEYGEVTNVTTAVDIYSFGPICALEMAVLEIfl- 

GNGESSYVP<2EAISSAI(2 2=13 
15 H+F APEYG +T A+DIY+FGMCALEMA LEI(2 N ES+ + 

+E I I 
Sbjct: 262 

RERGAHYFflAPEYGAADdLTAALDIYAFGHCALEMAALEIflPSNSESTAINEETIfiRTIF 3M1 

20 fluery: STH LLEDPLflREFIflKCLflSEPARRPTARELLFHPALFEVPSLKLLAAHCIV 

GHflHMIPE 350 

LE+ LI2R+ I+KCL +P RP+A +LLFHP LFEV SLKLL AHC+V 

++ n e 

Sbjct: 3^2 

25 SLENDL(3RI>LIRKCLNP(JP(JI>RPSANDLLFHPLLFEVHSLKLLTAHCLVFSPANRTriFSE 401 
fluery: 351 NALEEITKNM- 

DTSAVLAEIPAGPGREPVflTLYSflSPALELDKFLEDVRNGIYPLTAFGL MOT 

A + + + V+A++ G+E L S A +L+KF+EDV+ 

30 G+YPL + 

Sbjct: MQa 

TAFDGLMflRYYflPDVVMAflLRLAGGdERflYRLADVSGADKLEKFVEDVKYGVYPLITYS- HbD 
fluery: MID 

35 PRXXXXXXXXXXXXXXXXXXXXXXXXXAEVETRKVVLIIflCNIESVEEGVXXXXXXXXXXX HbT 

+ + E+R++V n C+++ E+ 

Sbjct: Hbl 

GKKPPNFRSRAASPERADSVKSATPEPVDTESRRIVNUMCSVKIKEDSNDITHTILLRMD S2D 

40 fiuery: M7D XXXXXXXSCDLI1PNENIPELAAELV<2LGFISEAD<3SRLTSLLEETL SIS 

+ C + N+ +L +ELV+LGF+ D<3 ++ LLEETL 
Sbjct: 521 DKHNRfiLTCflVNENDTAADLTSELVRLGFVHLDDflDKIflVLLEETL Sbb 

45 Pedant information for DKFZphf br2_7fldlfli frame 1 

Report for DKFZphf br2_76dlfl ■ 1 

50 

ELENGTHJ 5bM 
Midi b2MbH.fi? 
EpIJ 5.10 

EHOJ10LJ TREPIBL : AFlHSbTO^ gene: n BcDNA.LD2flb57"i product: 

55 "BcDNA .LD2flbS7"i Drosophila melanogaster clone LD2fib57 
BcDNA- LD2Bb5? (BcDNA • LD2flb57 ) mRNAi complete cds- le-123 
EFUNCAT]] D3.22 cell cycle control and mitosis ES. cerevisiae-. 
YJLD^SwJ be-15 
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CFUNCAT3 30-03 organization of cytoplasm ES • cerevisiae-. 
YJLDT5wJ be-15 

EFUNCAT3 11. 01 stress response ES- cerevisiae-, YJLDTSwl be-15 
EFUNCATl 03-01 cell growth ES. cerevisiae-. YJLOISw]] be-15 
5 EFUNCATJ 1Q.0S-11 key kinases ES - cerevisiae-, YJLQ^Sw]] be-15 

EFUNCATH 03. OM budding-i cell polarity and filament formation 

ES- cerevisiae-i YJLDTSuO be-15 
EFUNCATJ Tfi classification not yet clear-cut ES- cerevisiae-, 

YLRDTbwJ Ee-D"! 

10 EFUNCAT]! 30-02 organization of plasma membrane ES- cerevisiae-, 
YLROTbwJ Ee-OI 

EFUNCATJ 10-03-11 key kinases ES- cerevisiae-, YNRD31c3 3e-0T 

EFUNCATD 0=1.01 biogenesis of cell wall ES- cerevisiae-, 

YNR031c3 3e-0T 

15 EFUNCATJ 03-07 pheromone response-, mating-type determination-, 
sex-specific proteins ES. cerevisiae-, YLR3bEw3 Me-Qfl 
EFUNCATJ 10.05-11 key kinases ES • cerevisiae-. YLR3bEw3 He-05 

EFUNCAT3 10.0M.11 key kinases ES* cerevisiae-. YLR3bEw3 He-OS 

EFUNCAT3 11. 0M dna repair (direct repair-, base excision repair 

20 and nucleotide excision repair) ES. cerevisiae-, YPL153cJ le-07 
EFUNCAT3 03-n recombination and dna repair ES- cerevisiae-, 

YPL153c3 le-07 

EFUNCAT3 03.SE-01 cell cycle check point proteins ES- 
cerevisiae-. YPL153cJ le-07 
25 EFUNCAT]] 30-10 nuclear organization ES- cerevisiae-i YPL153c3 
le-07 

EFUNCATJ 03. ES cytokinesis ES- cerevisiae-. YDRS07c]l le-07 
EFUNCATJ 10. T=l other signal-transduct ion activities ES - 
cerevisiae-. YPLlS3c3 le-07 
30 EFUNCATJ 03-13 meiosis ES- cerevisiae-, Y»RSE3c3 3e-07 

EFUNCATJ 03-10 sporulation and germination ES- cerevisiae-. 
YDRSS3c3 3e-07 

EFUNCATJ 03. lb dna synthesis and replication ES- cerevisiae-, 

YUROOlcJ Se-Ob 

35 EFUNCATJ n unclassified proteins ES. cerevisiae-, YDRMTOcJ 

3e-0S 

EFUNCAT3 05-07 translational control ES- cerevisiae-, Y»REfi3c3 
le-QH 

EFUNCATI 01-05- OH regulation of carbohydrate utilization ES - 
40 cerevisiae-, YDR l 477w]l le-OH 
EBLOCKSJ PF00b37A 
EBLOCKSJ BP03niJ 
EBLOCKSJ PF01317B 

ESC0P3 dlir3a_ S-l-l-E-b insulin receptor Complex 

45 (transf erase/substrate) Ee-53 

ESC0P31 dlphk 5-1.1. Lb gamma-subunit of glycogen 

phosphorylase kinas 3e-bfl 

ESCOPJ dlfgkb_ 5-1.1-S.5 Fibroblast growth factor 

receptor 1 Ehuman (Horn le-55 

50 ESC0P3) dlabo S-LLl-lM Protein kiase CKE-, alpha 

subunit Eilaize (Ze Ee-5S 

ESCOPJ d31ck S-Ll-E-E Lymphocyte kinase (lck) EHuman 

(Homo sapiens) 7e-5 l 4 

ESC0P3 dSerk 5.1.1-1.11 MAP kinase ErkE Erat (Rattus 

55 norvegicus) Te-71 

ESC0P3 dlcdkb_ 5-1.1.1-2 cAllP-dependent PK-, catalytic 

subunit Comple le-55 
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40 



WO 01/98454 
ESC0P1 

(Homo sapiens) 

EEC] 5.?. 

EEC1 2.7. 

EPIRKliJl 

EPIRKU3 

EPIRKUl 

EPIRKUl 

EPIRKUl 

EPIRKUl 

EPIRKUl 

EPIRKUl 

EPIRKUl 

EPIRKUl 

EPIRKUl 

EPIRKUl 

EPIRKUl 

EPIRKUl 

EPIRKUl 

EPIRKUl 

EPIRKUl 

EPIRKUl 

EPIRKUl 

ESUPFAfO prot 
ESUPFAMl unas 
13 

ESUPFAMl 



PCT/IB01/02050 

dlhcl 5.1.1.1.1 Cyclin-dependent PK EHuman 

Me-b? 

1.112 Protein-tyrosine kinase Me-Dt 
1.37 Protein kinase 3e-CH 

phosphotransferase 2e-2S 

nucleus 3e-Qb 

RNA binding 3e-lQ 

tandem repeat 4e-D7 

cell cycle control 3e-Db 

serine/threonine-specif ic protein kinase 2e-13 
transmembrane protein 1e-D7 
autophosphorylation 3e-10 
tyrosine-specif ic protein kinase le-Db 
magnesium <4e-Q7 
ATP 2e-13 
receptor He-07 
phosphoprotein 2e-13 
apoptosis 3e-0b 
glycoprotein Me-Q7 
protein kinase 2e-2fl 
signal transduction 2e-0fl 
cell division le-11 
calmodulin binding 3e-Db 
ein kinase byr2 le-Db 

signed Ser/Thr or Tyr-specific protein kinases 2e- 



leucine-rich alpha-2-glycoprotein repeat homology Me-07 

ESUPFAMl double-stranded RNA-binding repeat homology 3e-lQ 

ESUPFAMl SAM homology le-Ofa 

ESUPFAMl death-associated protein kinase 3e-Dt 

ESUPFAMl ankyrin repeat homology 3e-Db 

ESUPFAMl protein kinase homology 2e-2fl 

ESUPFAMl kinase-related transforming protein 2e-0b 

ESUPFAMl protein kinase SPK1 3e-0b 

ESUPFAMl protein kinase Xa21 4e-07 

ESUPFAMl protein kinase TIK 3e-lD 

ESUPFAMl kinase interaction domain homology 3e-Db 

EPFAM1 Eukaryotic protein kinase domain 

EKtO All_Alpha 

EKU! 3D 

EKU1 L0U_C0MPLEXITY lb. 41 X 



45 SEfl IRGPGTRAGAEAfiAAGRGVGSAGLSVPSSMSEGESflTVLSSGSDPKVESSSSAPGLTSVS 

SEG • • • -xxxxxxxxxxxxxxxx xxxxx 

IkobA 



50 SEC PPVTSTTSAASPEEEEESEDESEILEESPCGRUCJKRREEVNflRNVPGIDSAYLAMDTEEG 

SEG xxxxxxxxxxxxxxxxxxxxxxxxxxxxx 

IkobA 



55 SEl3 VEVVUNEVflFSERKNYKLlJEEKVRAVFDNLIflLEHLNIVKFHKYUADIKENKARVIFITE 

SEG 

IkobA CHHHHHHHHHHHHHHHTTTBTTBCCEE 

EEEETTTEEEEEEC 
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SE<J YnSSGSLK(3FLKKTKKNHKTI1NEKAlJKRIj)CT(2ILSALSYLHSCDPPIIHGNLTCDTIFI(2 

SEG 

IkobA CCCCEEH — HHHHCTTTTC-CCHHHHHHHHHHHHHHHHHHH — 

5 HHCEETTTTTTTTEETT 

SE<2 HNGLIKIGSVAPDTINNHVKTCREEflKNLHFFAPEYGEVTNVTTAVDIYSFGMCALEIlAV 

SEG 

IkobA 

10 TTCCEEECCTTTTEECTTTTEEEEETTTGGGCCHHHHHCCCBCHHHHHHHHHHHHHHHHC 

SEfl LEI(2GNGESSYVP(3EAISSAI(3LLEDPLi3REFI(3KCLi3SEPARRPTARELLFHPALFEVP 

SEG 

IkobA 

15 CCTTTTCCCHHHHHHHHHHCCCCTTTHHHHHHHHHTTTTTGGGCCCHHHHHHTTTT 

SEfl SLKLLAAHCIVGHdHniPENALEEITKNMDTSAVLAEIPAGPGREPVflTLYSfiSPALELD 

SEG 

IkobA 



20 



25 



30 



SE<2 KFLEDVRNGIYPLTAFGLPRP(3(JP(2flEEVTSPVVPPSVKTPTPEPAEVETRKVVLf1flCNI 

SEG xxxxxxxxxxxxxxxxxxxxxxxxx 

IkobA 



SEfl ESVEEGVKHHLTLLLKLEDKLNRHLSCDLMPNENIPELAAELV<2LGFISEAD(2SRLTSLL 

SEG xxxxxxxxxxxxxxxxxx 

IkobA 



SE(2 EETLNKFNFARNSTLNSAAVTVSS 

SEG 

IkobA 



35 

(No Prosite data available for DKFZphf br2_7fldlfi - 1) 

40 Pfam for DKFZphf br2_7fldlfl .1 

HF1PI_NA[1E Eukaryotic protein kinase domain 
45 H MM 

*rLnHPNIIRFYDwFed- • -ddDHIYrilllEYneGGDLFDYIrrng p 

+L H NI++F ++ D + ++ +I+EYPI G+L +++++ + 

Query 1SS 

<2LEHLNIVKFHKYIilADIKENKARVIFITEYI1SSGSLK(2FLKKTKKNHKT SOD 

50 

WW 

UsEwelrf IMyfllLrGMeYLHSHg. . IIHRDLKPENILIDeNgqIKIcDF 

M+E+ +++ +AIL++++YLHS IIH L + I+I +NG 

IKI+ 

55 fluery 2D1 

HNEKAUKRIilCT(3ILSALSYLHSCI>PPIIHGNLTC]>TIFIt3HNGLIKIGSV 250 
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win 

GLARqMnnYerMttfCGTPUYMnAPEVIImgnyYttkVDMblSFGCILIiJEri 

++ N+ + + + APE + ++ TT+VD++SFG+ 

EM 

fluery 251 APDTINNHVKTCREE(2KNLHFF-APEY- 

GEVTNVTTAVDIYSFGHCALEM 214 

HMI1 

f1TGepPFyddnMemImrIiqrfrrpfUpnCSeElyl>FrirwCUnyDPekRP 

+ ++ + N E + ++ + ++ + + ++F+ +C++ 

P++RP 

Uuery 211 A— VLEIfl- 

GNGESSYVP(2EAISSAI<2LLEI>PL<3REFIl2KCL<3SEPARRP 3^45 

HMM TFrfllLnHPWF* 

T+R++L HP + 
fluery 3m= TARELLFHPAL 35b 
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group: transmembrane protein 

5 

DKFZphf br2_7fldM encodes a novel lfifi amino acid protein without 
similarity to known proteins. 

The novel protein contains 1 transmembrane region and a 
10 Cytochrome c family heme-binding site- 
No informative BLAST results* No predictive prositei pfam or SCOP 
motif e- 

The new protein can find application in studying the expression 
15 profile of brain-specific genes and as a new marker for amygdala 
cells- 



weak similarity to hypothetical protein of Arabidopsis thaliana 

20 

perhaps complete cds- 
Pedant: TRANSMEMBRANE 1 

Sequenced by MediGenomix 

25 

Locus: unknown 

Insert length: 1S"47 bp 

Poly A stretch at pos- 1527-. polyadenylation signal at pos- 15D8 

30 

1 TTGCCGCCGC CGCCACCCCC GCCCAGGATG GCGGAAGTGG AGGCGCCGAC 

51 GGCGGCCGAG ACGGACATGA AGCAATATCA AGGCTCCGGC GGCGTCGCCA 

1D1 TGGATGTGGA ACGGAGTCGC TTCCCCTACT GCGTGGTGTG GACGCCCATC 

35 151 CCGGTGCTCA CGTGGTTTTT CCCCATCATC GGCCACATGG GCATCTGCAC 

2D1 ATCCACAGGA GTCATTCGGG ACTTCGCGGG CCCCTACTTT GTCTCAGAGG 

251 ACAACATGGC CTTTGGAAAG CCTGCCAAGT ACTGGAAGTT GGACCCTGCT 

301 CAGGTCTATG CTAGCGGGCC CAACGCATGG GACACGGCTG TGCACGACGC 

351 CTCTGAGGAG TACAAGCACC GCATGCACAA TCTCTGCTGT GACAACTGCC 

40 i»Dl ACTCGCACGT GGCATTGGCC CTGAATCTGA TGCGCTACAA CAACAGCACC 

tJSl AACTGGAATA TGGTGACGCT CTGCTTCTTC TGCCTGCTCT ACGGGAAGTA 

501 CGTCAGCGTT GGGGCCTTCG TGAAGACCTG GCTGCCCTTC ATCCTTCTCC 

551 TGGGCATCAT CCTCACCGTC AGCCTGGTCT TTAACCTCCG GTGATGGCTG 

bDl CTCGGTGGCC CCACACCCAC CAGGGTCCCG AGGAAACAGC CGCCATCCCT 

45 LSI TTTGGTTCCA GATTTTTTTC TCCTCACCCC AAAAGGCAGG GTTGGGCCT6 

701 CTGTTGTGGA CCGGGGGTCG GGGCTGGCAG GATGGAAGGA CTGAGGACCA 

751 GCATGAAGTG GGGGTTTGTT GTCTCCCTGC CTCTCAGAAG CACCCTGTCC 

flOl CCTCCTCCCC AGGCCTGTGA CTCCGGCCCT GGAAGCCCCT TTGTTCTTCT 

flSl GTTGAAAGGC TTTGGCTTCC CTCTGTAGAG CTGCTCCCGC CACCACCTGC 

50 «101 TGGGGTCCTG CCTCAGCCCA GTGCCCAGTA TGGGGAGAGG AGGACATTTG 

=151 GGCTCACCTG TCAAGGTGGC CCTGGGACCA GAGCTGGTCC CAGCATGGGG 

1001 T6CACCGGGT ACACTTAACG TGTCTCTATA AGCCAAGTTG CTTCAGGACC 

1D51 TTCACCACTG GCCTCTAGAA TGGTCCAGAG GGGCTGGCTG GGTCCCTTTG 

1101 TCAGACTCCT GCCGGCAGCT GCCCTG6GGG ACATGTGTGC CCATCTGGCA 

55 1151 TCCTCCAGCC CGTGCAGTCC GCTCTTCACT GTTCCACGGC CTCCCAGTGC 

1201 CTCCCAGCAT TGGACCCATC TCCCCCTGCA GTTTGAGGCC AGAGAGGTGA 

1251 GTGGACCTGA CAAGTGCCAG AGTAACCGTG TAGACAGAGC AGTGTAGACA 

1301 GCGCTCAGCC CCAGCCCCAG GTGTGGACCT CATGCTGGTG ATGGCTCCCC 
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1351 TGGGTG6CCT GCCAGCACAG CCAGTGCCAT CAGGGAGCTG AAGGGGCTGT 
1MD1 CCCCCACCTA ACTCCAGCTC CCCCTTCACG TTGTCACCAA GGCCCTGTGC 
1HS1 CGCCCGCCTC GCCCCCCTGC TCTGTGGATT CCTTTGGGAA GGGCTCCCTG 
1501 GGCAGGACAA TAAAGAGTTT TGACTCCAAA AAAAAAAAAA AAAAAAA 



BLAST Results 



10 Entry TOEblb from database PIR:- 

hypothetical protein TlTLlfl-lS - Arabidopsis thaliana 
Score = EET-i P = 1.3e-17i identities = 57/lbl-i positives = 
76/lbl-, 
frame +1 



Medline entries 



20 

No Medline entry 



25 Peptide information for frame 1 



ORF from 2fl bp to 511 bpi peptide length: iflfl 
Category: similarity to unknown protein 
30 Classification: no clue 

Prosite motifs: CYTOCHROMES (ISl-llT) 



1 MAEVEAPTAA ETDMKflYfiGS GGVAMDVERS RFPYCVVUTP IPVLTUFFPI 

35 51 IGHMGICTST GVIRDFAGPY FVSEDNMAFG KPAKYUKLDP AflVYASGPNA 

101 UDTAVHDASE EYKHRMHNLC CDNCHSHVAL ALNLMRYNNS TNUNMVTLCF 

151 FCLLYGKYVS VGAFVKTULP FILLLGIILT VSLVFNLR 



40 

BLASTP hits 

No BLASTP hits available 

45 Alert BLASTP hits for DKFZphf brE_7fldM -, frame 1 

PIR:T02blb hypothetical protein TllLlfl.lE - Arabidopsis thaliana-i 
N = 

E-, Score = SEb-, P = M-Se-El 



>PIR:TD2blb hypothetical protein TlTLlfl.lE - Arabidopsis thaliana 
Length = Eb7 

55 HSPs: 

Score = 22b (33. 1 bits)-. Expect = M-Se-Eli Sum P(E) = H.5e-21 
Identities = 5S/13E (3^)-. Positives = 71/13S (53*) 
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fluery: 2S 

MDvERSRFPYCVVWTPIPVLTIdFFPIIGHMGICTSTGVIRDFAGPYFVSEDNIIAFGKPAK 84 

+D ++S+FP C+VUTP+PV++U P IGH+G+C GVI DFAG F++ D+ 

5 AFG PA+ 

Sbjct: bl 

IDTKKSKFPCCIVUTPLPVVSWLAPFIGHIGLCREDGVILDFAGSNFINVDDFAFGPPAR 12Q 

fluery: AS YblKLDPAdVYASGPNAlilDTAVHDASEEYKHRMHNLC-- 
10 CDNCHSHVALALNLMRYNNST- mi 

Y +LD + PN H +KH DN S + 

YN T 

Sbjct: 121 YLflLDRTKCCLP-PNMGG--- 
HTCKYGFKHTDFGTARTUDNALSSSTRSFEHKTYNIFTC 17b 

15 

(Juery: 142 NUIN-MVTLCFFCLLYG 15b 

N + V C L YG 
Sbjct: 17? NCHSFVANCLNRLCYG 1=12 

20 Score = 157 (23-b bits)-. Expect = l.fle-13-, Sura P(2) = l.Be-13 
Identities = 27/fll (33JO-. Positives = SO/ai Cbl*) 

Query- 1D1 

WMAVHDASEEYKHRMHNLCCDNCHSHVALALNLMRYNNSTNIilNMVTLCFFCLLYGKYVS IbO 
25 WD A+ ++ ++H+ +N+ NCHS VA LN + Y S UNMV + 

++ GK+++ 
Sbjct: 155 

WDNALSSSTRSFEHKTYNIFTCNCHSFVANCLNRLCYGGSMEUNMVNVAILLMIKGKUIN 214 

30 fluery: Ibl VGAFVKTULPFILL — LGIIL 17=J 

+ V+++LP ++ LG++L 
Sbjct: 215 GSSVVRSFLPCAVVTSLGVVL 235 

Score = 3b (5-4 bits)-, Expect = 4-5e-21i Sum P<2) = 4.5e-21 
35 Identities = 7/21 (33*), Positives = 14/21 (bb*) 

fluery: 10 AETDMKflYCJGSGGVAMDVERS 30 

++ ++K +G G MD++RS 
Sbjct: 12 SDRNLKMSRGRGVPMMDLKRS 32 

40 

Pedant information for DKFZphf br2_78d4i frame 1 
45 Report for DKFZphf br2_7Sd4 .1 



ELENGTH3 Iflfl 

CMIO 2117a. bb 

50 CpIH b-27 

EH0M0L3 PIR:T02blb hypothetical protein TllLlfl.12 - 

Arabidopsis thaliana 7e-32 

CPROSITEJ CYTOCHROMES 1 

EKUJ TRANSMEMBRANE 1 



55 



SE<3 MAEVEAPTAAETDMKflYtJGSGGVAMDVERSRFPYCVVUTPIPVLTlilFFPIIGHMGICTST 
PRD cccccchhhhhhhhhhccccccccccccccccccceeeccceeeeeeeeecccceeecce 



-225- 



WO 01/98454 



PCT/EB01/02050 



SEfl GVIRDFAGPYFVSEDNMAFGKPAKYUKLDPAfiVYASGPNAUDTAVHDASEEYKHRMHNLC 

PRD eeGeccccccccccccccccccceeeeccccceeeccccccccccccccchhhhhhhhee 

5 HEM 

SE(J CDNCHSHVALALNLHRYNNSTNUNMVTLCFFCLLYGKYVSVGAFVKTIiILPFILLLGIILT 

PRD ecccchhhhhhhhhhhccccccchhhhhhhhhhhccceeeeeeeeeeeccceeeeceeec 

mem miririniinnnrinn 



10 



SE<2 
PRD 
MEM 



VSLVFNLR 



ceeeeccc 



MMMMM- - . 



15 



Prosite for DKFZphf br2_7fldM • 1 



PSDD11D 



121->127 



CYT0CHR0ME_C 



PDOCODlt,^ 



20 



(No Pfam data available for DKFZphf br2_7fldt».l) 
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WO 01/98454 
DKFZphfbrE_7flelfl 



PCT/IB01/02050 



5 group: brain derived 

DKFZphf brE_76el6 encodes a novel 3D? amino acid protein without 
similarity to known proteins- 

10 The mRNA is differentially polyadenylated- 

No informative BLAST results^ No predictive prositei pfam or SCOP 
motif e- 

The new protein can find application in studying the expression 
15 profile of brain-specific genes- 



similarity to hypothetical protein of Arabidopsis thaliana 

20 differential polyadenylation 
> 7 exons 

complete on human genomic clone 451BElap- 
perhaps complete cds- 

25 Sequenced by HediGenomix 

Locus: /map= n 144-50 cR from top of Chrb linkage group" 

Insert length: 30Tb bp 
30 Poly A stretch at pos- 3075i polyadenylation signal at pos- 3047 



1 TGGTGAGTTC GGAGTAGAGA TGGCCGCGCT TGCACCGCTG CCCCCGCTCC 
51 CCGCACAGCT CAAGAGCATA CAGCATCATC TGAGGACGGC TCAGGAGCAT 
35 101 GACAAGCGAG ACCCTGTGGT GGCTTATTAC TGTCGTTTAT ACGCAATGCA 
151 GACTGGAATG AAGATCGATA GTAAAACTCC TGAATGTCGC AAATTTTTAT 
S01 CAAAGTTAAT GGATCAGTTA GAAGCTCTAA AGAAGCAGTT GGGTGATAAT 
E51 GAAGCTATTA CTCAAGAAAT AGTGGGCTGT GCCCATTTGG AGAATTATGC 
301 TTTGAAAATG TTTTTGTATG CAGACAATGA AGATCGTGCT GGACGATTTC 
40 351 ACAAAAACAT GATCAAGTCC TTCTATACTG CAAGTCTTTT GATAGATGTC 
401 ATAACAGTAT TTGGAGAACT CACTGATGAA AATGTGAAAC ACAGGAAGTA 
451 TGCCAGATGG AAGGCAACAT ACATCCATAA TTGTTTAAAG AATGGGGAGA 
501 CTCCTCAAGC AGGCCCTGTT GGAATTGAAG AAGATAATGA TATTGAAGAA 
551 AATGAAGATG CTGGAGCAGC CTCTCTGCCC ACTCAGCCAA CTCAGCCATC 
45 tOl ATCATCTTCA ACTTATGACC CAAGCAACAT GCCATCAGGC AACTATACTG 
LSI GAATACAGAT TCCTCCGGGT GCACACGCTC CAGCTAATAC ACCAGCAGAA 
701 GTGCCTCACA GCACAGGTGT AGCAAGTAAT ACTATCCAAC CTACTCCACA 
751 GACTATACCT GCCATTGATC CCGCACTTTT CAATACAATT TCCCAGGGGG 
601 ATGTTCGTCT AACCCCAGAA GACTTTGCTA GAGCTCAGAA GTACTGCAAA 
50 651 TATGCTGGCA GTGCTTTGCA GTATGAAGAT GTAAGCACTG CTGTCCAGAA 
=101 TCTACAAAAG GCTCTCAAGT TACTGACGAC AGGCAGAGAA TGAAGCCTTT 
■351 GTATGACAGA CCCATGTATT TTTGGCATGA GGAACTAACA GTCCATTACT 
1001 CTATCTTCAG CCTATCAGGA TCACAGTTTT AAGGAAGACT TGGTTTTGTT 
1051 GAATATGACA ATGAAATCTG TGTGTATCAG ATTTTTATTG AAGCATTCAT 
55 1101 CAGCAGCCTC AACCAGTTTT CATTGTCCAT TTACTAGATT CAATCGTCTC 
1151 TGAGTATATA GGGCTGATGT TAGCAAGACC CTAAAAATGT CCATTGAACC 
1E01 CTGCTTCAAA AAATGAAAAC ACACCTCTAT AAAATGTGTA CTGGGAATAA 
1H51 GCTTTGTATT TACATACATT AGGGGAATTT TTTAAAATCT GTAATGTTTG 
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1301 GACAAACAGA TGATATTACT TTGCTATAAA ATTATAAATG TAACTTTTAA 

1351 TAAAGATAGC CAGAATATTC TAAATTAGAA ATTACGTTTT TGTTTCCCTC 

mOl AAGACATAAA ACAAATATAA ACATTCTAAA CTGCTGGATG AATCTGAAAA 

1MS1 GACATTAAGT TCAAATTTTA ATTTATTCTC ATATTAAATA TAACTCCATT 

5 1SD1 AAAAGTTTAA AATTTCATGG GAGAAAATAT AATAAGGTAA AGAGGTAGAA 

1SS1 TCACTTTCAG ACTTAAGAAT AATGTTGATT TCCCAAGTGC TTTACCTTAT 

IbOl CTGTTAAAGC GTAAGATGAA TTGGTATTTG CTTCATAGGC AGTTTGACTG 

lb51 CATGTATTAG AGAATGAAAA GAAGATATTT GTAGTAATGC CTGGAAACTT 

17D1 GGTGCTTTAA ATTAAGGTAC TCCTCTGCTG CTGTAGAATG GATTCCACAC 

10 17S1 AGTGGATAGC TATGGGTGAT TCAGAATATT ATGTTTAGAT TCCCATTTGT 

lflOl TAAGTTTATA AGTTTTGTGG GGAATTATGA ACTTACTGTG TACTACCTGC 

1A51 ATTTGTGCTG TGTGAAAAAT AAATACAAGG ATTCGTTTAG CTAATTCAAC 

1101 TTACTACAAA GACAAATGTC TGTTTTTATT TGCCTGCTAG GATTGTCTTT 

1151 TTTAAAAGTC ATTTTTATTT ATAGGAATAT GGGTGTTTCT ATAGGAAGAA 

15 2001 ACAGGTTTTT TGTTTTTTGT TTTTTAAGAT AAATTTGACA AAGTTAACTG 

2051 AAATTTATCT GGTCCATTTT ATTCATGCTA CTAAGATGGG AATCTTTAAA 

2101 CACAAGGGTC AGCAAGCTTT GGCCCATGGA TTGGCCACCT GTTACGTAAA 

2151 TAAAGTTTCT TTGAAACAAG CCTACACTCA TTCATTTATG TTTTGTCTGT 

2201 GGTTGCTTTC CACAACTGCA GAGTTGTATG GCTTGCAAGT CTAAAAACAT 

20 2251 TTACTATTTG GCCCTCTAAG AAAAAGTTAA GACACCTAGT CTAATGGCCT 

2301 TTTGGGAAAA AACAAATCAC TAACTCATAA TCATTTATAT CCATTATTTT 

2351 CTGCATAAAT GTAATGCTAT TGTACAGGGT TTGGTAGAAT AAATATTCAG 

2f01 ACTGACTAAA CTGTTCTAAA TTCTCACAAA AAAGTCCCCA AACAACATGC 

2"4S1 CTCCTAAAAA ACATTTTCCT ATCTTTTACA AGAGGTATGA ACATTTGTAG 

25 2501 GGTTCCACAT TTGCATCTAG AAATCCAATG CTCTTTAGAA TGTTATTACG 

2551 AATAGAAAGA TGGCCAGGAT GACCTTTAGT GTTACATGAT GTTCAGCAAA 

2t01 TTTTAATTCA AACCTTGATA TGCCTGGACA CTGAAAAGTA AACGCATCAC 

2b51 CTCCTATTTT ATACCCTACC TTCTGGTTCC CAATTGGGAG AGCACATAGA 

2701 GGGAAGGAGA CAATATAGAA ACTACGGAGT CCGCTGGTAG TGGGCTGCAT 

30 2751 GGTGTGACAG AGCCCTTCTC TGTAAAATGG AAATGACACC ACTAGCCATC 

2fi01 TCAATAGTTA CAAGAATTAA AAGAGATACA GTACCTGAAG TGCTTAGCGC 

2fi51 ATGGTAGCAT TTCATAAATG TTTAGTGTCA ATACTAATGC TCTAATAATG 

2101 TAAATTGTTA ATAATTTATT TCCCTAATAT CAGGAAATCC CAGTTGTCTA 

2151 TGTGGCCCAG TGCTTAAAAA CGCCTTCTTG CATGAGGGGA TTGAACTATA 

35 3001 CAATGTTTGT TAACTTTGTA TTTGTATTTT TTCCTATAAA ATCTTAAAAT 

3051 AAAATTAGGA GATGTGTTCT GATGTAAAAA AAAAAAAAAA AAAAAA 



BLAST Results 

40 — 

Entry HSM51B21 from database EMBL : 

Human DNA sequence *** SEQUENCING IN PROGRESS *** from clone 
MS1B21 

45 Score = 112111 P = O-Oe+OD-. identities * 22A7/23M3 



Medline entries 

50 --- 

No Medline entry 



55 

Peptide information for frame 2 
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ORF from ED bp to THD bpi peptide length: 307 
Category: similarity to unknown protein 
Classification: no clue 

5 1 MAALAPLPPL PAflLKSIflHH LRTAflEHDKR DPVVAYYCRL YACIflTGHKID 

SI SKTPECRKFL SKLUDflLEAL KKflLGDNEAI TflEIVGCAHL ENYALKP1FLY 

1D1 ADNEDRAGRF HKNI1IKSFYT ASLLIDVITV FGELTDENVK HRKYARWKAT 

151 YIHNCLKNGE TPflAGPVGIE EDNDIEENED AGAASLPTflP TflPSSSSTYD 

201 PSNRPSGNYT GIfllPPGAHA PANTPAEVPH STGVASNTIfl PTPflTIPAID 

10 2S1 PALFNTISflG DVRLTPEDFA RAflKYCKYAG SALflYEDVST AVflNLflKALK 
301 LLTTGRE 



15 BLASTP hits 

No BLASTP hits available 

Alert BLASTP hits for DKFZphf br2_7flelfl-, frame 2 

20 

No Alert BLASTP hits found 

Pedant information for DKFZphf br2_7aelfl n frame 2 

25 



Report for DKFZphf br2_76elfl • 2 

CLENGTH1 313 
30 EriliO 344b3.^5 

EH0M0L3 PIR:T047 c ia hypothetical protein FIDIIES.ID - 

Arabidopsis thaliana 3e-22 
IKIO All_Alpha 
35 IKUDI L0ld_C0nPLEXITY Ib-bl '/. 

SEfl GEFGVEHAALAPLPPLPAflLKSIflHHLRTAflEHDKRDPVVAYYCRLYAIIflTGIIKIDSKTP 

SEG xxxxxxxxxxxxx 

40 PRD ccchhhhhheeecccccchhhhhhhhhhhhhhhhcccceeehhhhhhhhhhccccccccc 

SE<2 ECRKFLSKLMDflLEALKKflLGDNEAITflEIVGCAHLENYALKNFLYADNEDRAGRFHKNII 

SEG 

PRD chhhhhhhhhhhhhhhhhhhcccccccchhhhhhhhhhhhhhhhhhhccccccccccchh 

45 

SEfl IKSFYTASLLIDVITVFGELTDENVKHRKYARUKATYIHNCLKNGETPflAGPVGIEEDND 

SEG xxxxxxx- 

PRD hhhhhhhhhhhhhhhcccccccchhhhhhhhhhhhhhhhhhhhccccccccccccccccc 

50 SEfl IEENEDAGAASLPTflPTflPSSSSTYDPSNIIPSGNYTGIfllPPGAHAPANTPAEVPHSTGV 

SEG xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx 

PRD cccccccccccccccccccccccccccccccccccccccccccccccccccccccccccc 

SEfl ASNTIflPTPflTIPAIDPALFNTISflGDVRLTPEDFARAflKYCKYAGSALflYEDVSTAVflN 

55 SEG -•• •- 

PRD cccccccccccccccccccccccccccccccchhhhhhhhhhhhhcceeeecchhhhhhh 

SEfl Lfl.KALKLLTTGRE 
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WO 01/98454 PCT/IB01/02050 

SEG 

PRD hhhhhhhhccccc 

5 (No Prosite data available for DKFZphf br2_7flelfi .2) 
(No Pfam data available for DKFZphf br2_7flelfl .5) 
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5 group: metabolism 

DKFZphf br2_7fli21 encodes a novel "477 amino acid protein with 
similarity to beta-aspartate methyltransf erases • 

10 The L-isoaspartyl methyltransf erase (Pimt)n as an examplei is a 
highly conserved enzyme utilising S-adenosy lmethionine (Adoflet) 
to methylate aspartate residues of proteins damaged by age- 
related isomerisation and deamidation- 

15 The new protein can find application in diagnosis/modulation of 
protein damage and age-related degenerative processes- 



unknown protein 

20 

weak similarity to beta-aspartate methyltransf erase pimT of 
Mycobacterium leprae 
perhaps complete cds- 

25 Sequenced by MediGenomix 

Locus: unknown 

Insert length: IflME bp 
30 Poly A stretch at pos- ISITt polyadenylation signal at pos. IADS 



1 CCTTCGCGAA 
51 GTGCCTGCGG 
35 101 AGGAGCCCTT 
151 GACCTGCGAG 
SD1 AGGAGCAGAG 
251 CTGGATGTCT 
301 GAGTCATCCC 
40 351 CGGTCCCACA 
H01 AGTCCGCTAC 
■451 TCCAGAGAGA 
501 GGAGGGAGAA 
551 TCTTAAATAG 
45 hOl TTCCCCGGCC 
L.51 GAGGCCAGCC 
701 TAACATTCCC 
751 CCAGGTGATA 
601 ATTTTTATCC 
50 fiSl TACGAAAAGA 
101 GATTCATGGA 
=151 TATTCATAAG 
1001 TTGACGCAGT 
1051 TTTTACCCAC 
55 1101 CATCACACAG 
1151 CTCTTTCATG 
1E01 TGCCTTGCAA 
1ES1 AATCAACACA 



ACACTATGCT AATGGCATGG 
CAGGGGCTCG GAACCAATTC 
CGAGGGAGCT CGGTCACTGT 
ATGGAGAAAG AGAGCACGAG 
TCTTGCCCAT CTCTCCCTCT 
TTCGTCACTG GAAAACCTCA 
CTCGAGAGCT CGAGGACTCG 
CACCAGGGAT CCGAGGATCC 
CGAGGTCGAA GAGCGTCACG 
GACCCTTTCA GGCTGGGGAA 
ACAAAATTTA AGAAATTATT 
TAACTGGGGG GCAGTCCCGT 
AGATACTGAG GAGTTCCTTC 
TTGGAAGACT ATGTAGTATT 
AAAGGATATT AATATGATTC 
CTGTTTTGGA AGCTGGCTCA 
AAAGCAGTTG GATCACAAG6 
CCACCATGAT CTGGCTAAGA 
AATTAAGTCA TGTAGAAGAG 
GACATTTCAG GAGCAACCGA 
AGCTTTGGAT ATGTTAAATC 
ATCTTAAGCA TGGTGGTGTA 
GTTATTGAAC TTTTAGATGG 
TGAAAAGATA AGCGAGGTCA 
AACAGAAAAA TGGAATTTTA 
GATGTACAAC TAGATTCTCA 



TGCCGCGGTC CTGTCTTGCT 
ATTCCTGCAC GGCCTGGGGC 
GTTGCAGGTC CTCGCCTAGA 
GCGGCACAAA GGAAAGCCCC 
GAGCATCTCG GACATTGGGA 
GACTGCCGAC GCTGCGGGAA 
AGCGGAGACC AGGGCCGGTG 
TTCGATGCTC TCGCAGGCCC 
TCTCCCCTTC TTGTTCAACT 
CTGATTTTAG CTGAGACTGG 
TAGGTTGAAC AACTTCGGAC 
TCGGCAAGAT CGTGGGGAAG 
GGTAAGCAGT ACATGCTGAG 
GATGAAAAGA GGGACTGCCA 
TCTCAATGAT GGATATCAAC 
GGCTCTGGTG GAATGAGCTT 
ACGAGTCATA AGTTTTGAGG 
AGAATTACAA ACACTGGCGT 
TGGCCAGACA ATGTGGATTT 
AGACATAAAA TCTTTAACAT 
CTCATGTTAC TTTGCCTGTT 
TGTGCTGTAT ATGTAGTAAA 
AATTCGCACC TGTGAACTTG 
TTGTCAGAGA TTGGTTGGTT 
GCTCAAAAAG TAGAATCTAA 
AGAGAAAATT GGAGTTAAAG 
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13D1 GTGAGCTGTT TCAAGAGGAT GACCATGAAG AATCGCATTC TGATTTTCCA 

13S1 TATGGATCAT TTCCCTATGT TGCTAGACCA GTACACTGGC AACCTGGTCA 

1MQ1 TACAGCTTTT CTTGTCAAGT TGAGGAAGGT CAAACCACAA CTTAACTGAG 

1M51 TACTCCAGAT GACAGTAACT GACTTGAAGA TGGAAAAATA TCAAAATAGA 

5 1SD1 ACTTTATATT GAAAATCACT GCTTCCATAG ATTGGCATTT TTAGCTATTA 

1SS1 CTATGACTTA TATAACTTAT ACATATAATT TTGAAAATAA CAACTAAAAG 

IbDl ATGTATAACA TAGCAAAACT GCTTAAACAT CCCATTTTGA CACTTGTCTT 

11.51 GCAGTTAGTT TGACATTTTG TAGTTAATGA TTCCAAATTG GTTTAGTTGG 

17D1 GCCATCTCAT TCTTCACTTC CTGTAAACCA CTCCATAGAT TTGTCTTTCT 

10 1751 TCAAGAAATT AGTTTTCTTT CCTTTATTTG ATTGATGGTC ATTGACTACT 

IflDl GAAATAAAAT ATGCATTTTA AGAAAAAAAA AAAAAAAAAA AA 



BLAST Results 

15 

No BLAST result 



20 Medline entries 



No Hedline entry 

25 

Peptide information for frame 1 



30 ORF from lb bp to IHMb bp=l peptide length: i»7? 
Category: putative protein 
Classification: no clue 

1 IILflAlilCRGPV LLCLRflGLGT NSFLHGLGflE PFEGARSLCC RSSPRDLRDG 

35 51 EREHEAAflRK APGAESCPSL PLSISDIGTG CLSSLENLRL PTLREESSPR 

101 ELEDSSGDfiG RCGPTHflGSE DPSI1LS<2A(2S ATEVEERHVS PSCSTSRERP 

151 FdAGELILAE TGEGETKFKK LFRLNNFGLL NSNUGAVPFG KIVGKFPGfJI 

SD1 LRSSFGKflYH LRRPALEDYV VLMKRGTAIT FPKDINIIILS MMDINPGDTV 

251 LEAGSGSGGM SLFLSKAVGS fiGRVISFEVR KDHHDLAKKN YKHURDSWKL 

40 301 SHVEEUPDNV DFIHKDISGA TEDIKSLTFD AVALDMLNPH VTLPVFYPHL 

351 KHGGVCAVYV VNITflVIELL DGIRTCELAL SCEKISEVIV RDULVCLAKU 

MD1 KNGILAflKVE SKINTDVGLD SfiEKIGVKGE LFlJEDDHEES HSDFPYGSFP 
MSI YVARPVHlillJP GHTAFLVKLR KVKPflLN 

45 



BLASTP hits 

No BLASTP hits available 

50 

Alert BLASTP hits for DKFZphf br2_7fli21 i frame 1 
No Alert BLASTP hits found 
55 Pedant information for DKFZphf br2_7fli 21 n frame 1 



Report for DKFZphf br2_7fii 21 . 1 
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ELENGTH3 i»flE 
EfflilJ 535E1-E0 
5 CpI3 b-Efi 

EHOriOLJ TREMBL : AFDflflflDO_E product: "unknown"; Rhodococcus 

erythropolis ARC (arc) genei complete cds=l and unknown genes- Ee- 
E3 

EFUNCAT3 r general function prediction CM- jannaschiii 

10 f1JD13m be-lD 

EFUNCATJ DS-O? translational control ES. cerevisiae-i YJLlEScI 
be-OM 

CBL0CKSJ BLOOflQIE 
EBL0CKS3 BLD1S7^A 
15 CKU3 Alpha_Beta 

EKU3 L0bl_C0HPLEXITY E-MT * 

SE<3 PSRNTMLMAUCRGPVLLCLRflGLGTNSFLHGLGflEPFEGARSLCCRSSPRDLRDGEREHE 

20 SEG 

PRD cccceeeeeecccccchhhhhccccceeeeeccccceeeceeeeccccccccccchhhhh 

SE<2 AAdRKAPGAESCPSLPLSISDIGTGCLSSLENLRLPTLREESSPRELEDSSGDCGRCGPT 

SEG 

25 PRD hhhhhccccccccccceeeeeecccccccceeeccccccccccccccccccccccccccc 

SE(3 H(2GSEDPSMLSl2A(2SATEVEERHVSPSCSTSRERPFl2AGELILAETGEGETKFKKLFRLN 

SEG 

PRD cccccccchhhhhhhhhhhhhccccccccccccccccccceeeecccccccceeeeeecc 

30 

SEfl NFGLLNSNUGAVPFGKIVGKFPGfllLRSSFGKflYMLRRPALEDYVVLMKRGTAITFPKDI 

SEG 

PRD ccccccccccccccceeeccccceeeeecccceeeeccchhhhhhhhhhccceeeecccc 

35 SEfl NMILSriMDINPGDTVLEAGSGSGGriSLFLSKAVGSflGRVISFEVRKDHHDLAKKNYKHIilR 

SEG xxxxxxxxxxxx 

PRD cceeecccccccceeeeeccccchhhhhhhhhccccceeeeeehhhhhhhhhhhhhhhhh 

SEC3 DSUKLSHVEEWPDNVDFIHKDISGATEDIKSLTFDAVALDNLNPHVTLPVFYPHLKHGGV 

40 SEG 

PRD hccccccccccccceeeeecccccccccccccccceeeecccccccchhhhhhhcccccc 

SE(2 CAVYVVNITflVIELLDGIRTCELALSCEKISEVIVRDULVCLAKflKNGILAflKVESKINT 

SEG 

45 PRD eeeeeechhhhhhhhhhhhhhhhhhhhccceeeeeehhhhhhhhhhccceeeeccccccc 

SEfl DVt3LDSflEKIGVKGELF(3EDDHEESHSDFPYGSFPYVARPVHbltJPGHTAFLVKLRKVKPl2 

SEG 

PRD cccccccccccccccccccccccccccccccccccccccccccccccceeeeeccccccc 



50 



55 



SE<3 LN 
SEG - - 
PRD cc 



(No Prosite data available for DKFZphf brE_7fiiEl -1) 
(No Pfam data available for DKFZphf brE_7 A i El • 1 ) 

-233- 



WO 01/98454 



PCT/IB01/02050 



DKFZphmel2_12 jl 



5 

group: melanoma derived 

DKFZphmel2_12jl encodes a novel T05 amino acid proteini which has 
similarity to integrin I of Saccharomyces cerevisiae. 

10 

The novel protein contains a leucin zipper. 

No informative BLAST results^ No predictive prosite-i pfam or SCOP 
motife- 

15 The new protein can find application in studying the expression 
profile of melanoma-specific genes. 



weak similarity to integrin I (Saccharomyces cerevisiae) 

20 

Sequenced by EMBL 

Locus: unknown 

25 Insert length: bp 

Poly A stretch at pos- 2=J2b-, no polyadenylation signal found 



1 CGAAAGCTAA AGGCCGGCGC ACGCTGGGCG GTGGTGGTCC CTAAGCCGGG 
30 SI CCGCGGCCGG TGCAATGGAC TCCACTGCCT GCTTGAAGTC CTTGCTCCTG 

1D1 ACTGTCAGTC AGTACAAAGC CGTGAAGTCA GAGGCGAACG CCACTCAGCT 
151 TTTGCGGCAC TTGGAGGTAA TTTCTGGACA GAAACTCACA CGACTATTTA 
2D1 CATCAAATCA GATATTAACA AGTGAATGCT TGAGTTGCCT TGTAGAGCTA 
B51 CTTGAAGACC CCAACATAAG TGCTTCACTG ATCTTAAGTA TTATCGGTTT 
35 301 GCTGTCTCAA CTAGCAGTAG ACATTGAAAC CAGAGATTGT CTTCAGAATA 
3S1 CATATAATCT 6AATA6TGTG CTGGCGGGAG TGGTTTGTCG GAGCAGCCAC 
HM ACTGATTCGG TGTTTTTGCA GTGCATTCAA CTTCTACAGA AGTTAACATA 
1151 TAATGTCAAA ATTTTCTATT CTGGTGCCAA TATAGATGAA TTAATTACGT 
5D1 TCCTGATAGA TCACATTCAA TCTTCTGAAG ATGAGTTAAA AATGCCTTGT 
40 551 CTAGGATTAT TGGCAAATCT TTGTCGGCAC AATCTTTCTG TTCAAACGCA 
bOl CATAAAGACA TTGAGTAATG TGAAATCTTT TTATCGAACT CTTATCACCT 
t.51 TGTTGGCCCA TAGTAGTTTA ACTGTGGTTG TGTTTGCACT TTCAATATTA 
701 TCCAGTTTGA CATTAAATGA AGAGGTGGGG GAAAAGCTAT TCCATGCTCG 
751 AAACATTCAT CAGACTTTTC AACTAATATT TAATATTCTC ATAAACGGTG 
45 601 ATGGCACTCT AACTAGAAAG TATTCAGTTG ACCTACTGAT GGATCTCCTT 
fiSl AAGAATCCTA AAATTGCTGA TTATCTCACC AGATATGAGC ACTTTTCTTC 
"IDl ATGTCTTCAC CAAGTATTAG GTCTTCTTAA TGGAAAGGAT CCTGATTCCT 
=151 CTTCAAAGGT TTTAGAATTA CTTCTTGCCT TCTGTTCAGT GACTCAGCTG 
1001 CGCCATATGC TCACTCAGAT GATGTTTGAA CAGTCTCCAC CTGGCAGCGC 
50 1051 CACTCTGGGA AGCCATACTA AATGTTTAGA ACCTACTGTG GCTCTACTGC 
11D1 GCTGGTTAAG CCAACCTTTG GACGGATCAG AAAACTGTTC TGTTTTAGCA 
1151 TTGGAGTTGT TCAAGGAAAT ATTTGAGGAT GTCATAGATG CTGCTAACTG 
1201 TTCCTCGGCT GATCGTTTTG TGACCCTTCT GCTGCCTACA ATCCTTGATC 
1251 AACTTCAGTT CACAGAACAA AATCTAGATG AGGCTTTAAC AAGAAAAAAT 
55 1301 GTGAAAGGGA TTGCCAAGGC CATTGAAGTT TTGTTAACTC TCTGTGGAGA 
1351 TGATACACTA AAAATGCATA TTGCAAAAAT CTTGACAACT GTCAAGTGTA 
1M01 CCACTCTTAT AGAACAACAA TTTACATATG GCAAGATTGA CCTGGGATTT 
1M51 6GAACAAAGG TTGCAGATTC TGAATTATGC AAACTTGCTG CTGATGTAAT 
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1501 TTTGAAAACT CTTGATTTGA TTAACAAACT TAAACCATTG GTTCCTGGTA 

1SS1 TGGAAGTAAG CTTCTACAAA ATACTTCAGG ACCCACGTTT GATTACTCCT 

lfc.01 TTGGCTTTTG CTTTAACGTC AGATAATAGA GAACAAGTAC AGTCTGGACT 

IbSl GAGAATATTA TTGGAGGCTG CTCCACTGCC AGATTTTCCT GCTTTAGTAC 

5 1701 TTGGAGAAAG TATAGCAGCA AACAATGCCT ATAGACAACA GGAAACAGAA 

1751 CATATACCCA GAAAAATGCC CTGGCAATCA TCAAATCACA GTTTTCCAAC 

lflOl ATCAATAAAG TGTTTAACTC CTCATTTGAA AGATGGTGTT CCTGGATTGA 

IflSl ATATTGAAGA ATTAATAGAG AAACTTCAGT CTGGAATGGT GGTAAAGGAT 

1=101 CAGATTTGTG ATGTGAGAAT ATCTGACATA ATGGATGTAT ATGAAATGAA 

10 1151 ACTATCCACA TTAGCTTCCA AAGAAAGCAG GCTACAAGAT CTTTTGGAAA 

2001 CAAAAGCTCT AGCCCTTGCA CAGGCTGATA GACTGATTGC TCAGCATCGC 

2051 TGTCAAAGAA CTCAAGCTGA AACAGAGGCA CGGACACTTG CTAGTATGTT 

E101 GAGAGAAGTT GAGAGAAAA A ATGAAGAGCT TAGTGTGTTG CTGAAGGCGC 

2151 AGCAAGTTGA ATCAGAAAGA GCGCAGAGTG ATATTGAGCA TCTCTTTCAA 

15 2201 CATAATAGGA AGTTAGAGTC TGTGGCTGAA GAACATGAAA TACTGACAAA 

2251 ATCCTACATG GAACTTCTTC AGAGAAATGA AAGTACTGAA AAGAAGAATA 

2301 AAGATTTACA GATCACATGT GATTCTCTGA ATAAACAAAT TGAGACAGTG 

2351 AAAAAGTTGA ATGAGTCACT CAAGGAACAA AATGAAAAAA GTATTGCCCA 

2401 ATTAATAGAG AAAGAAGAAC AGAGAAAAGA AGTACAGAAT CAGCTAGTAG 

20 2451 ACAGAGAACA TAAGCTAGCA AATTTGCATC AAAAAACAAA AGTACAAGAA 

2501 GAAAAGATTA AAACCTTACA AAAGGAAAGG GAAGATAAGG AAGAAACCAT 

2551 TGATATCCTT AGAAAAGAAT TAAGCAGAAC AGAACAGATA AGAAAAGAGT 

2L01 TGAGCATTAA GGCTTCCTCC CTAGAGGTTC AAAAGGCACA ATTAGAAGGT 

2b51 CGTTTGGAAG AGAAAGAGTC CTTGGTGAAA CTTCAGCAAG AGGAATTGAA 

25 2701 CAAACACTCC CACATGATAG CAATGATCCA CAGTTTAAGT GGTGGAAAAA 

2751 TAAATCCAGA AACTGTGAAT CTCAGTATAT AGACATTATG GCATTTTGGA 

2601 ATTTGTAATC TCATGATATT TTTGATGTAT TTATCTATTG GkGGGGGGGT 

2fl51 GGGTAGGGGA GTTAATTTGT GACTTCGTAA CAATAAGAAG TTATTATCTA 

2^01 ATTTAGTAAA GACCCTGATC TGTTGCAAAA AAAAAAAAAA AA 



BLAST Results 



30 



35 No BLAST result 



Medline entries 

40 

=^□31111: 

Hostetter PIK-. Tao NJi Gale C-. Herman DJi llcClellan Mi Sharp RL-, 
Kendrick KE-i Antigenic and functional conservation of an 
integrin 
45 I-domain in 

Saccharomyces cerevisiae. Biochem Plol lied 1115 Augi55(2) :122-30 
=1=1456454 ' 

Berton Gi Lowell CA-t Integrin signalling in neutrophils and 
50 macrophages. Cell Signal 1=1=1=1 Sepill(=l) !t21-35 



55 Peptide information for frame 2 



ORF from b5 bp to 277=1 bpi peptide length: TDS 
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Category: putati 
Classification: 
Prosite motifs: 

5 

1 HDSTACLKSL 
51 LTSECLSCLV 
101 SVLAGVVCRS 
151 IC3SSEDELKM 

10 501 SLTVVVFALS 
ESI RKYSVDLLMD 
301 ELLLAFCSVT 
3S1 PLDGSENCSV 
M01 EflNLDEALTR 

15 M51 <2<2FTYGKI1>L 
SOI YKILflDPRLI 
SSI AANNAYR<3<2E 
bOl IEKLfiSGMVV 
t.51 LAflADRLIAd 

20 701 ERAfiSDIEHL 
751 TCDSLNKfllE 
flOl LANLHflKTKV 
fl51 SSLEVflKAflL 
=101 VNLSI 

25 



BLASTP hits 

30 No BLASTP hits available 

Alert BLASTP hits for DKFZphmel2_12 jl-. frame S 

TREHBL : SCINTANA_1 Saccharomyces cerevisiae integrin analogue 
35 gene-i 

complete cds-i N - l-i Score = 21Ln P = 1.3e-13 



>TREMBL : SCINTANA_1 Saccharomyces cerevisiae integrin analogue 
40 gene n complete 
cds- 

Length = l-i015 

HSPs: 

45 

Score = 21b (3B-4 bits)n Expect = l-3e-13-i P = l-3e-13 
Identities = flO/302 (2t»X)i Positives = 155/302 (SIX) 

(Juery: 517 IEELIEKLflSGMVVKDfllCDVRISDIII— 
50 DVYEMKLSTLASKESRLdDLLETKALALAfi tS3 

I L EKL++ I>+ + +IS++ + E +L+ + ++ L+ 

LET AL + 

Sbjct: 275 ISLLKEKLETATTANDENVN- 
KISELTKTREELEAELAAYKNLKNELETKLETSEKALKE 333 

55 

fluery: tSM A— BRLIAflHRCflRTflAETEAR TLASMLREVERKNEELSVLLKA-- 

(JdJVESERAfl 70M 



PCT/IB01/02050 

ve protein 

Cellular transport and traffic 
LEUCINE_ZIPPER (331-352) 



LLTVSflYKAV KSEANATflLL RHLEVISGfiK LTRLFTSNUI 

ELLEDPNISA SLILSIIGLL SflLAVDIETR DCLflNTYNLN 

SHTDSVFLflC IflLLUKLTYN VKIFYSGANI DELITFLIDH 

PCLGLLANLC RHNLSVUTHI KTLSNVKSFY RTLITLLAHS 

ILSSLTLNEE VGEKLFHARN IHflTFflLIFN ILINGDGTLT 

LLKNPKIADY LTRYEHFSSC LHflVLGLLNG KDPDSSSKVL 

<3LRHMLT(2IH1 FEfiSPPGSAT LGSHTKCLEP TVALLRIdLSfl 

LALELFKEIF EDVIDAANCS SADRFVTLLL PTILDfiLflFT 

KNVKGIAKAI EVLLTLCGDD TLKHHIAKIL TTVICCTTLIE 

GFGTKVADSE LCKLAADVIL KTLDLINKLK PLVPGI1EVSF 

TPLAFALTSD NREflVflSGLR ILLEAAPLPD FPALVLGESI 

TEHIPRKMPU (JSSNHSFPTS IKCLTPHLKD GVPGLNIEEL 

KDfllCDVRIS DlflDVYEHKL STLASKESRL (3DLLETKALA 

HRC(2RT<2AET EARTLASP1LR EVERKNEELS VLLKAflfiVES 

F(2HNRKLESV AEEHEILTKS YMELLflRNES TEKKNKDLfll 

TVKKLNESLK E<2NEKSIA<2L IEKEEflRKEV UNULVDREHK 

(JEEKIKTLiJK EREDKEETID ILRKELSRTE flIRKELSIKA 

EGRLEEKESL VKLflfiEELNK HSHPIIAMIHS LSGGKINPET 
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+ + + + CJ + TE+ +L + L +E+++E+L+ LK 

+(J+ ++ Q 
Sbjct: 33^ 

VKENEEHLKEEKI<JLEKEATETKlJ<JLNSLRANLESLEKEHE]>LAA(JLKKYEE<JIANKER<J 3^3 

5 

(Juery: 70S SDIEHLFflHNRKLESVAEEHEILTKSYMEL L(JRNESTEKKNKDL(JIT- 

CDSLNKfllE 7bD 

+ E + (J N ++ S +E+E + K EL ++ +ST ++ +L+ + 

D+LN <ji+ 
10 Sbjct: 3m YN- 

EEIS<2LNDEITST(J<JENESIKKKNDELEGEvKAP1KSTSEElJSNLKKSEIDALNL<JIK MSE 

(Juery: 7bl 

TVKKLNESLKE(JNEKSIA(JLIEKEE(JRKEV(2N(JLVDREHKLANLH<JKTKv'(IEEKIKT fll? 

15 +KK NE+ + + SI + + + KE+A++ +E +++ L K K 

E+K 

Sbjct: i|53 

ELKKKNETNEASLLESIKSIESETVKIKELlJDECNFKEKEVSELEDKLKASEDKNSKYLE 51E 

20 (Juery: fllfi LlJKEREDKEETIDI LRKELSRTElJIRKELSIKASSLE- 

ViJKAlJLEGRLEEKESLVK Q7E 

LUKE E +E +D L+ +L + + K S L ++K E R 

+E L K 

Sbjct: 513 

25 L(JKESEKIKEELDAKTTELKI(JLEKVTNLSKAKEKSESELSRLKKTSSEERKNAEE(JLEK S7E 

(Juery: 673 L(3(3E 67t 
L+ E 

Sbjct: 573 LKNE 57b 

30 

Score = Iflb (27. bits)-. Expect = E-Oe-lO-i P = S.0e-1D 
Identities = AE/3D1 (E72) i Positives = 155/301 (51*) 

(Juery: 51fi EELIEKL(JSGI1VVKDflICDVRISl)II1DVYEriKLSTLASKESR— LflD- 
35 LLETKALALAfl t,53 

+ELI +L(3+ +K + » S++ V L K++ LlJD +L 

K 

Sbjct: bll DELI- 

RLiJNENELKAKEIDNTRSELEKVSLSNDELLEEKflNTIKSLlJDEILSYKDKITRN tbT 

40 

(Juery: bS^l 

ADRLIA<JHRC<JRT(JAETEARTLASriLREVERKNEELSvLLKA<J(JVESERA(JSDIEHLF<JH 713 
++L++ R + E+ L LR + ++ LK + ES + 

++++E + 

45 Sbjct: b?D DEKLLSIERDSKRDLES 

LKEflLRAAflESKAKV/EEGLKKLEEESSKEKAELEKSKEM 7E5 

(Juery: 71H NRKLESVAEEHEILTKSYNELLlJRN-ESTEKKNKDLtJITCDSL- 
NKlJIETVKKLNESLKE 771 
50 +KLES E +E KS ME ++++ E E+ K + +L +++ + + 

++NES K+ 
Sbjct: 7Sb 

MKKLESTIESNETELKSSNETIRKSDEKLE(JSKKSAEEDIKNL(JHEKS])LISRINESEK]> 735 

55 (Juery: 77E (JNE-KSIA(JLIEKEE(JRKE-ViJN(JLVDREHKL- 
ANLH£3KTKV<JEEKIKTL<JKEREDKEET 3EB 

E KS ++ K E V+ +L + + K+ N + T V + K++ 

+++E +DK+ 
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Sbjct: 7flb IEELKSKLRIEAKSSSELETVKfiELNNAflEKIRVNAEENT- 
VLKSKLEDIERELKDKflAE 6^^ 

(Juery: fl2T IDILR — KEL — SRTEfllRKEL SIKASSLEVCJKAdLE- 

5 GRLEEKESLVKLfl 67^4 

I + KEL SR +++ +EL S + S EV+K (2+E 

+L+EK L++ + 
Sbjct: flMS 

IKSNfiEEKELLTSRLKELEflELDSTfiflKAflKSEEESRAEVRKFflVEKSflLDEKAIILLETK 1DM 

10 

fluery: fl75 (3EEL-NK flfiO 
+L NK 

Sbjct: TDS YNDLVNK 111 

15 Score = 173 (Eb-0 bits)-. Expect = 5.?e-01n P = S^e-QT 
Identities = 77/E67 (St.*)-. Positives = lib/SB? (50'/.) 

fluery: bDl IEKLflSGMVVKDfllCDVRISPiriDVYEriKLSTLASKES-- 
RLflDLLETKALALAflADRLI bSB 
20 ++K + + K++ + IS + D E+ ST ES + D LE + 

A+ 

Sbjct: 3BD. LKKYEEfllANKERflYNEEISfiLND — EIT- 
STflflENESIKKKNDELEGEVKAMKST 13E 

25 fluery: bSI AflHRC<3RT<2AETEARTLASI1LREVERKNE-- 

ELSVLLKAt3<2VESERA<3SDIEHLF<2H-NR 715 

++ + ++E +A |_ ++E+++KNE e s+l + +ESE + I+ 

L N 

Sbjct : M33 SEE<3SNLKKSEIDALNL--(2IKELKKKNETNEASLLESIKSIESETv'K — 
30 IKELflDECNF MSB 

(2uery: 71b KLESVAEEHEILTKSY— 
HELL<3RNESTEKKNKDLflITCDSLNK<2IETVKKLNESLKE<2 77E 

K + V+E + L S + L+ + +EK ++L L (2+E V 

35 L+++ KE+ 

Sbjct: MAT 

KEKEVSELEDKLKASEDKNSKYLEL(3KESEKIKEELDAKTTELKI(2LEKVTNLSKA-KEK SH7 

Query: 773 NEKSIAULIE-KEEfiRKEVflNflL— VDREHKLAN — 
40 LH<JKTKV(2EEKIKTLl2KERE]>KEE SB7 

+E +++L + E+RK + <2L + E ++ N ++ K+ E T+ 

+E +K 

Sbjct: 5HB 

SESELSRLKKTSSEERKNAEEt2LEKLKNEI(2IKN(2AFEKERKLLNEGSSTIT<2EYSEKIN bQ7 

45 

fluery: flEfl TI- 

DILRKELSRTEflIRKELSIKASSLEV(2KA<2LEGRLEEKESLVKL<2flEELNKHSHriI BBS 

T+ D L + + E KE+ S LE + LEEK++ +K (2+E+ 

+ I 
50 Sbjct: bDB 

TLEDELIRL(3NENELKAKEI])NTRSELEKVSLSNDELLEEKl3NTIKSL<3DEILSYKDKI bbb 

Score = 171 (ES-7 bits)-. Expect = l^e-CH-. P = T-3e-01 
Identities = 7b/311 (E»4*)i Positives = 1SE/311 (H&'z.) 

55 

fiuery: S=Jb NIEELIEKLflSGMVVKDiS 

ICDVRISDIHDVYEHKLSTLASICESRLI3DLLETKA bMB 
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N EE +EKL++ + +K+I2 + + SI Y K++TL + 

RLfl + E KA 
Sbjct: Sb5 

NAEEflLEKLKNEIfllKNflAFEKERKLLNEGSSTITflEYSEKINTLEDELIRLflNENELKA b2>4 

5 

fluery: bHI LALAflADRLIAflHRCflRTflA-ETEARTLASIILREVERKNEELSVL- 
LKAflflVESERAflSD 70b 

+ + + + E + T+ S+ E+ ++++ K 

+E + ++ D 
10 Sbjct: b2S 

KEIDNTRSELEKVSLSNDELLEEKflNTIKSLflDEILSYKDKITRNDEKLLSIERD-SKRD bfl3 

fluery: 7D7 IEHLFflHNRKL-ESVAEEHEILTKSYtlELLflRNESTEKKN ■ 

KDLfllTCDS LNKfl 7B6 

15 +E L + R ES A+ E L K E + EK K L+ T +S 

L 

Sbjct: bflM 

LESLKEflLRAAflESKAKVEEGLKKLEEESSKEKAELEKSKEJIIIKKLESTIESNETELKSS 713 

20 fluery: IETVKKLNESLKEflNEKSIAflLIEK- 
EEflRKEVflNflLVDREHKLANLHflKTKVflEE— K 614 

+ET++K +E L EA++KS +1+ +++++++++ e + L K 

+++ + + 

Sbjct: 7M4 METIRKSDEKL- 
25 EflSKKSAEEDIKNLflHEKSDLISRINESEKDIEELKSKLRIEAKSSSE S02 

fluery: aiS IKTLflKEREDKEETIDILRKE LSRTEfllRKELSIKASSL— 

EVflKAflLEGRLEEK fib? 

++T+++E + +E ! + +E S+ E I +EL K + + + +K 

30 L RL+E 

Sbjct: fiD3 

LETVKflELNNAflEKIRVNAEENTVLKSKLEDIERELKDKflAEIKSNflEEKELLTSRLKEL fib2 

fluery: flbfi ESLVKLflflEELNK flfiD 
35 E + fl++ K 

Sbjct: fib3 EflELDSTflflKAflK fi7S 



40 



Score = IbS (SM-fi bits). Expect = M-le-Ofl-. P = 4-le-Ofl 
Identities = b5/2fib ( 22'/.) -, Positives - m^/Sflb (52*) 



fluery: SIS LNIEELIEKLflSGMVVKDfllCDVR-ISDIflDVYEflKLSTLASKESRL- 
flDLLETKALALA bS2 

+N ++ + L+ + K I +++ I++ ++ +++ + l_+ ++ + 

++L+E K+ 

45 Sbjct: im VNHflKETKSLKEDIAAK-- 

ITEIKAINENLEKMKIflCNNLSKEKEHISKELVEYKS-RFfl 17D 

fluery: bS3 flADRLIAflHRCflRTflAETEARTLASMLREVERKNEELSVLLKAflflVESE 

-RAflSDIE 7D8 

50 D L+A+ T+ + ++LA+ ++++ +NE L ++ + ES 

fl+ 1+ 

Sbjct: 171 SHDNLVAK LTE— 

KLKSLANNYKDMflAENESLIKAVEESKNESSIflLSNLflNKID 253 

55 fluery: 70=) HLFflH — NRKLE— 

SVAEEHEILTKSYMELLflRNESTEKKNKDLfllTCDSLNKfllETVKK 7b4 

+ fl N ++E S+ + E L K+ +L fl E K+ + D 

fll +K+ 
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Sbjct: 22M SnSflEKENFCIERGSIEKNIECLKKTISDLEflTKEEIISKSDSSK--- 
DEYESfllSLLKE EflO 

fluery: 7b5 

LNESLKE(3NEKSIA(3LIEKEEflRKEVflNflLVDREHKLANLH(3KTKV(3EEKIKTLl3KERE]> 82i» 
E+ N++++ ++ E + R+E++ +L ++ L K + E+ +K 

+++ E 

Sbjct: 281 

KLETATTANDENVNKISELTKTREELEAELAAYKNLKNELETKLETSEKALKEVKENEEH 3H0 



fluery: fl2S - 

KEETIDILRKELSRTEl3IRKELSIKASSLEV(3KAl3LEGRLEEKESLVKL(3(3EELNK 880 

KEE I L KE + T+(2 L SLE + L +L++ E + ++ 

+ N+ 

15 Sbjct: 3M1 LKEEKIfl- 

LEKEATETKfidLNSLRANLESLEKEHEDLAA(2LKKYEE<2IANKER(2YNE 31b 



Score = 158 (23-7 bits)-, Expect = l.le-07-. P = l.le-O? 
Identities = 7H/2b8 (27*)-, Positives = 13b/2b8 (SO'/.) 



fluery: 518 EELIEKLtfSGNVVKDfllCDVRISDHIDVYEM—KL- 
STLASKESRLflDLLET — KALALA b52 

+E -K++ G+ ++ +++ Eil KL ST+ S E+ L+ +ET 

K+ 

25 Sbjct: bIS 

flESKAKVEEGLKKLEEESSKEKAELEKSKErHIKKLESTIESNETELKSSriETIRKSDEKL 751 

fluery: bS3 

(2ADRLIA(3HRC(3RT(2AETEARTLASMLREVERKNEELSVLLKAfl(!VESERA(JSDIEHLF(3 712 
30 + + A++C3E LS + EE+ EEL L+ + S 

++ + L 

Sbjct: 755 EflSKKSAEEDIKNLflHEKS-- 
DLISRINESEKDIEELKSKLRIEAKSSSELETVKfiELNN 812 

35 fluery: 713 HNRKLESVAEEHEILTKSYI1ELL(3RNESTEK<NKI>L(3ITCDSLNK(2IET— 
VKKLNESLK 770 

K+ AEE+ +L KS +E ++R E K+K +1 + <++ T 

+K+L + L 

Sbjct: 813 AfiEKIRVNAEENTVL-KSKLEDIER 

40 ELKDK(2AEIKSNt2EEKELLTSRLKELE<2ELD 8b7 

fluery: 771 El2NEKSIA(2LIEKEE<2RKEV<2N<3LVDR— 
EHKLANLHdKTKVfiEEKIKTLflKEREDKEE 827 

+K A<2 E EE R EV+ V++ + K L K K + 

45 +++ + ++ 

Sbjct: 8b8 ST(2CK--A<3KSE- 

EESRAEVRKFflVEKSflLDEKAflLLETKYNDLVNKEflAUKRDEDTVKK 12M 



(3uery: 828 TIDILRKELSRTE<3IRKEL-SIKASSLEV£3KA<2LEGRLE fibS 
50 T D R+E+ E++ KEL ++KA + ++++A ERE 

Sbjct: 125 TTDS(3R(2EI EKLAKELDNLKAENSKLKEAN-EDRSE 151 

Score = 155 (23-3 bits)-, Expect = 3.1e-07-. P = 3-1e-Q7 
Identities = 73/2b1 (27X)-. Positives = 133/2b1 (MT/C) 

55 

Uuery: b2M DVYEHKLSTLASKESRLflD-LLETKALALAflADRLIA(2HRCl2RTl2AET 

EARTLASUL b71 



-240- 



WO 01/98454 



PCT71B01/02050 



++ E K +T+ S LC2D +L K ++L++ R + E+ + 

R 

Sbjct: b43 ELLEEKflNTIKS 

LfiDEILSYKDKITRNDEKLLSIERDSKRDLESLKEfiLRAAflESK b^fl 

5 

fluery: bflQ REVE— 

RKNEELSVLLKAfiflVESERAflSDIEHLFflHNRKLESVAEEHEILTKSYriELLfl 73b 

+VE +K EE S KA+ +S+ +E + N + E + 

KS +L a 

10 Sbjct: b^ AKVEEGLKKLEEESSKEKAELEKSKEIIIIKKLESTIESNET-- 
ELKSSMETIRKSDEKLEfl 7Sb 

fluery: 737 RNESTEKKNKDL<mCDSLNK(2IETVKKLNESLKE<3--- 
NEKSIA(2LIEKEEt3RKEVc2Nc2 7=13 
15 +S E+ K+L<2 L +1 +K E LK + KS ++L 

+++ a + 
Sbjct: 757 

SKKSAEEDIKNLflHEKSDLISRINESEKDIEELKSKLRIEAKSSSELETVKflELNNACJEK fllb 

20 (Juery: 714 L-VDREH 

KLANLHfiKTKV(2EEKIKTL(2KEREDKEETIDILRKELSRTE(JIRKEL 8Mb 

+ V+ E KL ++ ++ K ++ +IK+ (3+E+E + L +EL 

T+(2 + + 
Sbjct: 817 

25 IRVNAEENTVLKSKLEDIERELKBK(3AEIKSNflEEKELLTSRLKELEt3ELl>ST(3fl-ICA(3K fl7S 

fluery: 847 SIKASSLEVflKAULE-GRLEEKESLVKLfiflEEL-NK flfiO 

S + S EV+K (2+E +L+EK L++ + +L NK 
Sbjct: fl7b SEEESRAEVRKFflVEKSCJLDEKAriLLETKYNDLVNK 111 

30 

Score = 14b (El-T bits)-, Expect = 3-Se-Ob-. P = 3-5e-0b 
Identities = 73/311 (23fc> Positives = 1SB/311 (4fi*> 

fluery: S2Q DNRE(2V<3SGLRIL LEAAPLPDFPALV — 

35 LGESIAANNAYRflflETEHIPRK-PIPldfi S71 

+++ +V+ GL+ L E A L ++ L +1 +N + E I 

+ + 
Sbjct: bib 

ESKAKVEEGLKKLEEESSKEKAELEKSKEMNKKLESTIESNETELKSStlETIRKSDEKLE 75S 

40 

fluery: 575 SSNHSFPTSIKCLTPHLKDGVPGLNIEEL- 
IEKLflSGMVVKDfllCDVRISDHIDVYEMKL b30 

S S IK L D + +N E IE+L+S + + + + S 

++ + +L 

45 Sbjct: 75b flSKKSAEEDIKNLflHEKSDLISRINESEKDIEELKSKLRI 

EAKSSSELETVKflEL AID 

fluery: b31 STLASK— 

ESRL(2I>LLETKALALA(2ADRLIA(3HRC(2RT(2AETEARTLASI1LREVERKNE bfi7 
50 + K + +L++K L +R+ ++ +E LS 

L+E+E++ + 

Sbjct: All NNAflEKIRVNAEENTVLKSK™ 
LEDIERELKDKflAEIKSNflEEKELLTSRLKELEflELD fib? 

55 fluery: bflfi 

ELSVLLKA(3(2VESERA(2SDIEHLF(3HNRKLESVAEEHEILTKSYriELLl3RNESTEKKN<D 747 
S KAQ+ E E +++++ Ffi + + E+ +L Y +L+ + 

++ ++ 



-241- 



10 



WO 01/98454 PCT/IB01/02050 

Sbjct: aba — ST<2<3KAfiKSEEE-SRAEVRK-F(2VEKS~ 
CLDEKAMLLETKYNDLVNKEflAbJKRDEDT 1S1 

fluery: 748 L<2ITCDSLNK<2IETVKKLNESLKE<2NEKSIAflLIEKEEd!RKEV<3N<2LV 

DREHKLANL 304 

++ T DS ++IE + K ++LK +N K L E E R E+ + ++ D 

+ K N 

Sbjct: 1EE VKKTTDSflRflEIEKLAKELDNLKAENSK 

LKEANEDRSEIDDLPILLVTDLDEK — NA 175 

fluery: 805 HflKTKVflEEKIKTLflKEREDKEETID 830 

++K+++ ++ E +D+EE D 
Sbjct: 17b KYRSKLKDLGVEISSDEEDDEEEEDD 1001 

15 Score = 14b (ELI bits)-. Expect = 4-be-Ob-. P = 4-be-Ob 
Identities = BE/313 (SbX)-. Positives = 145/313 <4bX) 

fluery: 518 

EELIEKLflSGH VVKDfllCDVRISDiriDVYEriKLSTLASKESRLflDLLETKALALAflADRL bS7 
20 EEL +L + +K+++ + + E+K + KE ++<2 LE +A 

a 

Sbjct: 304 EELEAELAAYKNLKNELETKLETSEKALKEVKENEEHLKEEKIfl-- 
LEKEATETK<2(2 358 

25 fluery: b58 IAflHRCflRTfiAETEARTLASriLREVERK NEELSVL 

LKAflflVESERAflSD 70b 

+ R EE LA+ L++ E + NEE+S L + + (2 

E+E + 

Sbjct: 351 

30 LNSLRANLESLEKEHEDLAAdLKKYEEfllANKERflYNEEISflLNDEITSTflfiENESIKKK 418 

iSuery: 707 IEHLFflHNRKLESVAEEHEILTKSYIIELLflRN- 
ESTEKKNKDLUITCDSLNKUIET-VKK 7b4 

+ L + ++S +EE L KS ++ L + +KKN+ + + K 

35 IE+ K 

Sbjct: 411 

NDELEGEVKAMKSTSEEflSNLKKSEIDALNLiaiKELKKKNETNEASLLESIKSIESETVK 478 

fluery: 7b5 LNESLKEflN — EKSIAflLIEK EE<JRKEV<2Nt3LVDREHKLAN-LHt2KT — 

40 -KVflEEKI 815 

+ E EN EK +++L +K E + +L K+ L KT 

K+fl EK+ 
Sbjct: 471 

IKELflDECNFKEKEVSELEDKLKASEDKNSKYLELflKESEKIKEELMKTTELKIflLEKV 538 

45 

duery: fllb 

KTL<2KERE])KEETIDILRKELSRTE<2IRKELSIKASSLEV<2KAl2LEGRLEEKESLVKLfl<2 875 
L K +E E ELSR ++K S + + E <2 +L+ ++ K 

+ ++ 

50 Sbjct: 531 TNLSKAKEKSES ELSR— 

LKKTSSEERKNAEE<3LEKLKNEI<2IKN<2AFEKER 588 

fluery: 87b EELNKHSHI1IAI1IHSLSGGKINPETVNL 103 
+ LN+ S I +S + E + L 

55 Sbjct: 581 KLLNEGSSTITflEYSEKINTLEDELIRL bib 

Score = 145 (El-8 bits)-. Expect = 5-le-Ob-, P = 5.1e-0b 
Identities = 51/E4b (E3*)-, Positives = 115/S4b (4b*) 
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fluery: b3^1 ASKESRLfl- 

DLLETKALALAflADRLIAflHRCflRTflAETEARTLASMLREVERKNEELSVL b'lE 

+ ES +fl L+ K +++S + +R E L + ++E+ 

5 EE ++ 

Sbjct: E07 SKNESSIflLSNLflNKIDSMSflEKE--- 
NFfllERGSIEKNIEflLKKTISDLEflTKEE — II Ebl 

fluery: b<?3 LKAflflVESERAflSDIEHLFflHNRKLESVAEEHEI 

10 LTKSYNELLflRNESTEKKNKD 7M7 

K+ +E+SIL+ ++A++ LTK+ EL 

+ + + 

Sbjct: £bS SKSDSSKDEY-ESfllS- 

LLKEKLETATTANDENVNKISELTKTREELEAELAAYKNLKNE 311 

15 

Query: ?4fl 

L(2ITCDSLNK(JIETVKKLNESLKE(2NEKSIAi2LIEKEEi3RKEVl2N(aLVl>REH<LANLH(JK fiD7 
L+ ++ K ++ VK+ E LKE+ + + E ++fl ++ L E 

+ +L + 
20 Sbjct: 35D 

LETKLETSEKALKEVKENEEHLKEEKKJLEKEATETKraflLNSLRANLESLEKEHEPLAAfl 371 

fluery: flQfi 

TKVflEEKIKTLflKEREDKEETIDILRKELSRTEfllRKELSIKASSLEVflKAflLEGRLEEK fib? 
25 K EE+I KER+ EE I L E++ T+fl + + K LE + 

++ EE + 

Sbjct: 3fl0 LKKYEEfllAN — KERflYNEE- 
ISflLNDEITSTflflENESIKKKNDELEGEVKAIlKSTSEEfl 43b 

30 fluery: fibfi ESLVKLflflEELN fi?T 

+L K + + LN 
Sbjct: M3? SNLKKSEIDALN MMfi 

Score = 137 (EO-b bits)-. Expect = i|.Ee-D5-, P = H.Se-05 
35 Identities = S1/31S (E52) i Positives = 14D/31E (44*) 

fluery: S^fi EELIEKLflSGM VVKDfllCDVRISDIIIDVYEIIKLSTLASK-ESRLflBLLET- 
KALALAflAD bSS 

+EL ++++ ++ +++ S+I D +++ L K E+ LLE+ 

40 K++ 

Sbjct: 14E0 DELEGEVKAMKSTSEEflSNLKKSEI- 
DALNLfllKELKKKNETNEASLLESIKSIESETVK M7fi 

fluery: bSb 

45 RLIAflHRCflRTflAETEARTLASflLREVERKNEELSVLLKAflflVESERAflSDIEHLFflHNR 715 

AC EE L L+EKN+ LK + E + 

L 

Sbjct: H7T IKELflDECNFK-- 

EKEVSELEDKLKASEDKNSKYLELflKESEKIKEELDAKTTELKIflLE 53b 

50 

fluery: 71b 

KLESVAEEHEILTtCSYMELLflRNESTEKKNKBLfllTCDSLNKfllETVKtCLNESLKEflNEK 775 
K+ ++++ E ++S + L++ S E+KN + fl+ fll+ + + 

K NE 

55 Sbjct: 537 KVTNLSKAKE-KSESELSRLKKTSSEERKNAEEflLEKLKNEIfllKN- 
flAFEKERKLLNEG 5TM 
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fluery: 7?b SIAflLIEKEEflRKEVflNflLV--DREHKL-ANLHflKTKVflEEKIKTLflKER- 
EDKEETIDI 631 

S E E+ ++++L+ E++L A T+ + EK + E 

E+K+ TI 
5 Sbjct: 515 

SSTITflEYSEKINTLEDELIRLflNENELKAKEIDNTRSELEKVSLSNDELLEEKflNTIKS b5M 

fluery: 632 LRKE-LSRTEfll RKELSIKASS LEVflKAflLEGRLEEK 

ESLVKLflflE--- 67b 

10 L+ E LS ++I K LSI+ S LE K flL E K EL 

KL++E 

Sbjct: b55 

LflDEILSYKDKITRNDEKLLSIERDSKRDLESLKEflLRAAflESKAKVEEGLKKLEEESSK 71M 

15 fluery: 677 ELNKHSWIIAfllHS 610 

EL K n+ + S 
Sbjct: 715 EKAELEKSKEMMKKLES 731 

Score = ISA (1T.S bits)i Expect = B-^e-DMi P = 3.1e-DM 
20 Identities = 80/3Sb (22X)-. Positives = lM6/35b CHI*) 

fluery: SHb LGESIAANNAYRflflETEHIPRKIIPUflSSNHSFPTSIKCLTPHL 

OGVPGLN-I 5*17 

L E + ++ E+ + ++S+ H SIK L L K 

25 G+N + 

Sbjct: 55 

LDEHTflLRDVLETKDKENflTALLEYKSTIHKflEDSIKTLEKELETILSflKKKAEDGINKM 61 

fluery: STfl EELIEKLflSGMVVKDfllCD — 
30 VRISDIUDVYEMKLSTLASKESRLflDLLETKALALAflAD bS5 

+ +L 11 ++C + D+V KT + KE +E 

KA+ + 

Sbjct: 65 GKDLFALSREMflAVEENCKNLflKEKDKSNVNHflK- 
ETKSLKEDIAAKITEIKAIN-ENLE 1H2 

35 

fluery: bSb 

RLIAflHRCflRTflAETEARTLASMLREVERKNEELSVLLKAflflVESERAflSDIEHLFflHNR 715 
++<2C EE ++LE + + + L+ ++ ++ 

+ + N 

40 Sbjct: 1M3 KilKIfl— CNNLSKEKEH— 

ISKELVEYKSRFflSHDNLVAKLTEKLKSLANNYKDflflAENE llfi 

fluery: 71b KLESVAEEHEILTKSYHELLflRN- 
ESTEKKNKDLfllTCDSLNKfllETVKKLNESLKEflNE 77H 
45 L EE + + + Lfl +S ++ ++ fll S+ K IE +KK 

L++ E 

Sbjct: m 

SLIKAVEESKNESSIflLSNLflNKIDSMSflEKENFfllERGSIEKNIEflLKKTISDLEflTKE SS6 

50 fluery: 775 

KSIAflLIEKEEflRKEVflNflLVDREHKLANLHflKTKVflEEKIKTLflKEREDKEETI flBT 

+ I++ + + E ++fl+ + KL KI L K RE+ E 

+ 

Sbjct: EST EIISK — 
55 SDSSKDEYESfllSLLKEKLETATTANDENVNKISELTKTREELEAELAAYKN 315 

fluery: 63D — DILRKELSRTEfllRKELSIKASSLEVfiKAfiLEGRLEE-KESLVKLflfl — 
EELNK-HSH 663 
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+ L +L +E+ KE + L+ +K (2LE E K+ L L+ E 

L K H 

Sbjct: 31b 

LKNELETKLETSEKALKEVKENEEHLKEEKI<3LEKEATETKfl(2LNSLRANLESLEKEHED 37S 

5 

Query: 384 MIAIII flflfl 
+ A + 

Sbjct: 37b LAAflL 3SD 

10 Score = 117 (17-b bits)i Expect = 3.fle-D3-, P = 3.8e-03 
Identities = 50/240 (50X), Positives = 111/240 <4bJC) 

fiuery: b34 

ASKESRL(3DLLETKALALA(3ADRLIAl3HRC(JRT(JAETEARTLASI1LREVERKNEELSVLL b=l3 
15 A E L+ L E + A+ ++ + + E+ L S + + + 

+E+L 

Sbjct: b=H 

AKVEEGLKKLEEESSKEKAELEKSKEMMKKLESTIESNETELKSSMETIRKSDEKLECiSK 7Sfl 

20 Query: b^M KAfl(3VESERAt2 SD- 

IEHLF(JHNRKLESVAEEHEILTKSYI1ELL(3RNESTEKKNKDLfl 741 

K+ + + + Q SD I + + + +E + + I KS EL + 

+ ++ 

Sbjct: 75T 

25 KSAEEDIKNL(3HEKSDLISRINESEKDIEELKSKLRIEAKSSSELETVK(3ELNNA(2E<IR 818 
Query: 750 

ITCDSLNKfllETVKKLNESLKEflNEKSIAflLIEKEEflRKEVflNflLVDREHKLANLHflKTK 801 
+ + N +++ KL + +E +K A++ +E+++ + ++L + E +L 

30 + QK + 

Sbjct: fln VNAEE-NTVLKS— KLEDIERELOKd- 
AEIKSN(2EEKELLTSRLKELE(2ELDST(3<3KA<2 574 

Query: aiO VflEEK 

35 IKTLCKEREDKEETIDILRKELSRTE(3IRKELSIKASSLEV(2KA(3LEGRLE Bb5 

EE+ ++ Q E+ +E +L E + + KE + K V+K 

+ + + 

Sbjct: fl7S KSEEESRAEVRKFflVEKSflLDEKAHLL— 
ETKYNDLVNKEdAlilKRDEDTVKICTT-DSflRfl 131 

40 

Query: abb EKESLVK 872 

EELK 
Sbjct: EIEKLAK 138 

45 Score = 1CH (lb-4 bits)-. Expect = 2.be-02i P = 2-5e-02 
Identities = b4/284 (22JO-, Positives = 135/284 C47X) 

fluery: STB 

EELIEKLlJSGMVVKDinCDVRISDIIIDVYEMKLSTLASKESRLflDLLETKALALA <2A b54 

50 +E+++KL+S ++ +1 E +SE +++L K+ 

++ ++ 
Sbjct: 723 

KEMIIKICLESTIESNETELKSSriETIRKSDEKLEiaSKKSAEEDIKNLflHEKSBLISRINES 782 

55 fluery: bS5 DRLIAUHRCU- 

RT(2AETEARTLASHLREVERKNEELSVLLKA(2<3VESERAl2S]>IEH-LF<a 712 

++ I + + + R +A++ + L ++ +E+ E++ V + V + + 

DIE L 
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Sbjct : 783 EKDIEELKSKLRIEAKSSSE-LETVKflELNNAflEKIRVNAEENTVLKSKLE- 
DIERELKD AMD 

Guery: 713 HNRKLESVAEEHEILTKSYMELLflRNESTEKK-NKDLfilTCDSLNK- 
5 12IETVKKLNES — 7bA 

+++S EE E+LT EL <2 +ST++K K + + + K fl+E 

+L+E 

Sb jet : flm KdAEIKSNflEEKELLTSRLKELE(3ELDST<2flKAflKSEEESRAEVRKF(2VEK- 
SflLDEKAM A=H 

10 

fluery: 7kT LKEUNEKSIA— ULIEKEEfl— 
RKEVflN(3LVDREHKLANLH(3KTKV(3EEKIKTLflKERE A23 

L E + Q +++E +K +<2 + E KLA K + 

K+K ++R 

15 Sbjct: 1D0 LLETKYNDLVNKEflAWKRDEDTVKKTTDSflRfiEIE- 
KLAKELDNLKAENSKLKEANEDRS ISA 

fluery: A24 DKEETI DILRKELSRTE(2IRKELSIKASSLEV<2KA<2LEGRLEEKE 

20 + ++ + D+ K ++ K+L ++ SS E + E E+ E 

Sbjct: ^ST EIDBLMLLVTDLDEKNAKYRSKL-KDLGVEISSDEEDDEEEEDDEEDDE 
lODb 

Score = lb (14.4 bits)i Expect = Lle+DOi P = b-be-Dl 
25 Identities = 4D/210 <M5C)i Positives = 101/210 (4AJO 

fluery: fc>Al EVERKN— 

EELSVLLKAfiflVESERAflSDIEHLFflHNRKLESVAEEHEILTKSYMELLflRN 73fl 

EEKN+L+++V +++ L++ + ++LK 

30 +L + 

Sbjct: is 

ETELKNVRDSLDE«TflLRDVLETOKEN(2TALLEYKSTIHtC(2EDSIKTLEKELETiLSflK 74 

fluery: 731 ESTE 

35 KKNKDL(2ITCDSLNKt3IETVKKLNESLKE(3NEKSIA(2LIEKEEi3RKEV(3N(JL 714 

+ E K KDL +L+++++ V++ ++L+++ +KS + +++ 

K ++ + 

Sbjct: 7S KKAEDGINKflGKDLF ALSREMflAVEENCKNLflKEKDKSN 

VNHdKETKSLKEDI 127 

40 

fluery: 715 VDREHKLANLHflKTKVCJEEKIKTLdKERED- 
KEETIDILRKELSRTEfllRKELSIKASSL flS3 

+ ++ +++ + + + L KE+E +E ++ + S + K 

L+ K SL 

45 Sbjct: ISA AAKITEIKAINENLEKMKIflCNNLSKEKEHISKELVEYKSRFflSHDNLVAK- 
LTEKLKSL Ifib 

fluery: A54 EVflKAflLEGRLEEKESLVKLflflEELNKHSHfllAHIHS AID 

++ E ESL+K +E N+ S ++ + + 
50 Sbjct: 1A7 ANNYKDMC3A ENESLIKAVEESKNESSIflLSNLflN 22Q 

Score = 52 (7. A bits)-. Expect = 2-De-lD-, P = 2-Oe-lD 
Identities = 31/1L.7 (23X)-. Positives = 74/lb7 (44X) 

55 fluery: 11 LNSVLAGVVCRSSHTDSvTL(JCI«3LL(2KLTYNVKIFYSGANI]>EL- 
ITFLIDHK2SSEDE 157 

LN + + ++ ++ L+ 1+ ++ T +K N E ++ L D 

+++SED+ 
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Sbjct: MM? 

LNLfllKELKKKNETNEASLLESIKSIESETVKIKELUDECNFKEKEVSELEDKLKASEDK 50b 
fluery: ISA - 

5 LKHPCL6LLANLCRHNLSV(3THIKTLSNVKSFYRTLITLLAHSSLTVVVFALSILSSLT Sib 

K L + + L+T T ++ T++ S + + 

S 

Sbjct: SD7 NSKYLELflKESEKIKEELDAKT— 
TELKIdLEKVTNLSKAKEKSESELSRLKKTSSEER Sb3 

10 

fluery: 217 LN-EEVGEKLFHARNI-HflTFflLIFNILINGDGTLTRKYS — VDLLMDLL 
2b2 

N EE EKL + I +<2 F+ +L G T+T++YS ++ L D L 
Sbjct: 5bM KNAEEdLEKLKNEIfllKNdAFEKERKLLNEGSSTITfiEYSEKINTLEDEL 
15 bl3 



Pedant information for DKFZphmel2_12 jl ■> frame 2 



20 

Report for DKFZphmel2_12 jl -2 



ELENSTH3 T05 
25 IflliO 1020b7-fll 

ipii 5. as 

CH0M0L1 TREHBL : SCINTANA_1 Saccharomyces cerevisiae 

integrin analogue genei complete cds. le-lM 

CFUNCATJ Ofl-D? vesicular transport (golgi network-! etc.) IS- 
30 cerevisiae-, YDL05fiw3 Se-lb 

CFUNCAT3 30-03 organization of cytoplasm IS- cerevisiaei 
YDLOSflwH Se-lb 

EFUNCATJ 1 genome repl ication-i transcriptioni recombination and 
repair CM- jannaschiii HJ13221I le-10 
35 CFUNCATJ 01-10 nuclear biogenesis ICS- cerevisiae-. YDR35bwJ 

2e-10 

EFUNCATJ 30-OM organization of cytoskeleton IS- cerevisiae-. 

YDR3Sbw3 2e-10 

CFUNCAT3 03-22 cell cycle control and mitosis US- cerevisiae-. 
40 YDR3Sbw3 2e-10 

CFUNCAT3 30-10 nuclear organization tS- cerevisiaei YKROISwD 
le-01 

EFUNCAT3 11-OM dna repair (direct repair-, base excision repair 
and nucleotide excision repair) IS- cerevisiae-. YKROTSwl le-01 
45 CFUNCAT3 Ofl-22 cy toskeleton-dependent transport IS- cerevisiaei 
YHR023w MY01 - myosin-1 isoformJ Me-01 

CFUNCAT3 D3-0M buddingi cell polarity and filament formation 

IS- cerevisiaei YHRD23w MY01 - myosin-1 isoforml Me-01 
CFUNCATJ 03-2S cytokinesis IS- cerevisiaei YHR023w I1Y01 - 
50 myosin-1 isofornO Me-01 

CFUNCAT3 11 unclassified proteins IS- cerevisiaei YNLOIlwJ 

3e-0fl 

CFUNCAT3 0=5-25 vacuolar and lysosomal biogenesis ES- 
cerevisiaei Y0R32bw3 be-Ofl 
55 CFUNCAT3 Ofl-lb extracellular transport DCS- cerevisiaei 

Y0R32bw3 be-Ofl 

EFUNCAT3 01-13 biogenesis of chromosome structure IS- 
cerevisiaei YLROBbwI fle-Oa 
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CFUNCATJ Tfl classification not yet clear-cut IS- cerevisiae-, 

YJR13McJ le-07 

CFUNCATI Ob-07 protein modification (glycolsylation-, acylation-. 
myr istylation-. palmitylation-. f arnesylation and processing) 
5 CS- cerevisiae-, YKL2Dlc3 ^e-□7 

CFUNCAT3 30-05 organization of centrosome CS- cerevisiae-. 
YILmMwl i*e-0b 

CFUNCATJ 03-07 pheromone responsei mating-type determinat ion-, 
sex-specific proteins CS. cerevisiae-, YNL07TC3 5e-0b 
10 CFUNCAT3 03-01 cell growth CS- cerevisiae-. YNL07 E lc3l Se-Ob 

CFUNCAT3 Ofi.TT other intracellular-transport activities CS- 
cerevisiae-, YNL07Tc3 Se-Db 

CFUNCAT3 OT-O 1 * biogenesis of cytoskeleton CS - cerevisiae-. 
YKL171c3 be-Db 

15 CFUNCAT3 30*02 organization of plasma membrane CS- cerevisiae-. 
YEROOflcl fle-Db 

CFUNCAT3 03-11 recombination and dna repair CS. cerevisiaei 

YNL2S0w3 le-OS 

CFUNCAT3 03-13 meiosis CS- cerevisiae-. YDR2a5wl le-DS 
20 CFUNCATJ 30-13 organization of chromosome structure CS- 
cerevisiae-, YDR2flSw3 le-OS 

CFUNCAT3 11.01 stress response CS * cerevisiae-. YPRlMlcH 2e-0S 
CFUNCAT3 0b-10 assembly of protein complexes CS. cerevisiae-. 

YPRmicJ 2e-0S 

25 CFUNCATJ Ob-01 protein folding and stabilization CS- 
cerevisiae-, YNL227c3 le-OS 

CFUNCATJ 0S-0M translation (initiation-, elongation and 
termination) CS. cerevisiae-. YAL03SwID le-OH 

CFUNCAT3 10-05-11 other pheromone response activities CS- 
30 cerevisiae-, YHRlSflcH le-O^I 

CFUNCATJ o chaperones CM- genitalium-, MG355]] 2e-0H 
CFUNCAT3 03-22-01 cell cycle check point proteins CS- 
cerevisiae-. YGLOflbwJ 2e-0H 

CFUNCAT3 03-10 sporulation and germination CS- cerevisiae-, 
35 YNL22SC3 3e-0i* 

CFUNCAT3 r general function prediction CM* jannaschii-. 

I1J12SM]! Me-OM 

CFUNCAT3 OA. 01 nuclear transport CS. cerevisiae-. YPL17Mc3 l 4e-O l 4 
CFUNCAT3 0M. OS. 01. 01 general transcription activities CS. 
40 cerevisiae-. YHR227c TAFb? - TFIID subunitl be-OM 
CBL0CKS3 PR01002E 

CBL0CKS3 BLOllbOB Kinesin light chain repeat proteins 

CBL0CKS3 BL0032bD Tropomyosins proteins 

CSC0P3 d2tmab_ 1.105-M-l-l Tropomyosin Crabbit 

45 (Oryctolagus cuniculus) 3e-23 

CEO 3.b-l-32 Myosin ATPase He-10 

EPIRKU3 nucleus Se-01 

CPIRKU3 phosphotransferase 2e-07 

CPIRKliO blocked amino end le-Ob 

50 CPIRKW3 duplication 2e-07 

CPIRKU3 citrulline 3e-0fl 

CPIRKU3 tandem repeat l 4e-10 

CPIRKU3 heterodimer le-07 

CPIRKU3 heart He-Ofl 

55 CPIRKU3 endocytosis 7e-0fl 

CPIRKW3 transmembrane protein le-lM 

CPIRKU3 serine/threonine-specif ic protein kinase 2e-07 

CPIRKW3 cell wall 2e-0b 
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IPIRKLD 
IPIRKLD 
IPIRKLD 
IPIRKLD 
5 IPIRKliD 
IPIRKliD 
IPIRKLD 
IPIRKLD 
IPIRKLD 

10 IPIRKLD 
IPIRKLD 
IPIRKLD 
IPIRKLD 
CPIRKUJ 

15 IPIRKLD 
IPIRKliD 
IPIRKLD 
IPIRKLD 
IPIRKLD 

20 IPIRKLD 
IPIRKLD 
IPIRKLD 
CPIRKUJ 
IPIRKLD 

25 IPIRKliD 
IPIRKliD 
IPIRKLD 
IPIRKliD 
IPIRKLD 

30 IPIRKliD 
IPIRKLD 
IPIRKLD 
IPIRKLD 
IPIRKLD 

35 IPIRKLD 
ISUPFAfD 
ISUPFAfD 
□7 

ISUPFAfD 

40 ISUPFAfD 
ISUPFAfD 
ISUPFAfD 
ISUPFAFD 
ISUPFAfD 

45 ISUPFAfD 
ISUPFAfD 
ISUPFAfD 
ISUPFAfD 
ISUPFAfD 

50 ISUPFAfD 
ISUPFAfD 
ISUPFAfD 
ISUPFAfD 
ISUPFAfD 

55 ISUPFAfD 
ISUPFArD 
ISUPFAfD 
ISUPFAfD 
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zinc finger 7e-0fl 
DNA binding 3e-01 
metal binding 7e-0fl 
muscle contraction 4e-lD 
brain Ee-Db 

acetylated amino end Ee-07 
heterotetramer 5e-D7 
actin binding Me-1D 
mitosis le-Dfl 
microtubule binding le-Dfl 
ATP Me-10 

chromosomal protein le-D7 
thick filament 1e-10 
phosphoprotein le-OT 
skeletal muscle le-Dfl 
calcium binding 3e-Dfl 
alternative splicing Ie-1D 
DNA condensation le-0? 
coiled coil le-14 
P-loop Ee-ID 
heptad repeat 5e-DT 
methylated amino acid 4e-lD 
immunoglobulin receptor Be-D7 
peripheral membrane protein 7e-0fl 
cardiac muscle Me-Dfl 
hydrolase 4e-lD 
microtubule 5e-DT 
muscle 4e-0fl 
membrane protein 5e-DT 
EF hand 3e-0A 
cell division le-Db 
cytoskeleton be-01 
hair 3e-Dfl 

calmodulin binding 7e-0A 
Golgi apparatus Ee-D7 
hypothetical protein YJL07Mc 5e-D1 

unassigned Ser/Thr or Tyr-specific protein kinases Ee- 

myosin motor domain homology Ee-ID 
alpha-actinin actin-binding domain homology be-01 
tropomyosin Ee-Dfl 
kinesin heavy chain Se-D7 
plectin be-DT 
SAM homology le-Db 
trichohyalin 3e-Dfl 

ribosomal protein SID homology be-D1 
protein kinase C zinc-binding repeat homology Se-DT 
giantin 7e-Dfl 

protein kinase homology Ee-D7 

protein 4-1 membrane-binding domain homology Te-Ofl 
human early endosome antigen 1 7e-Dfl 
myosin flYOE Ee-Db 
N5 protein 36-0*1 

Mycoplasma genitalium hypothetical protein flGElfl Se-DT 
myosin heavy chain Ee-ID 
conserved hypothetical PUS protein 3e-D1 
centromere protein E le-Dfl 
calmodulin repeat homology 3e-Dfl 
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ESUPFAPO hypothetical protein MJ0^14 2e-D7 

ESUPFAMJ hypothetical protein MJ1322 3e-Ul 

CSUPFAHni pleckstrin repeat homology Se-DT 

ESUPFAMJ kinesin motor domain homology le-Dfl 

ESUPFAMl ezrin-1e*M 

IPROSITEJ LEUCINE_ZIPPER 1 

CKU3 TRANSMEMBRANE 2 

EKIiO L0U_C0MPLEXITY 3-01 X 

CKIO COILED COIL 16-3H X 



10 



SE<2 MDSTACLKSLLLTVS(3YKAVKSEANAT(2LLRHLEVISG<2KLTRLFTSN<2ILTSECLSCLV 

SEG xxxxxxxxxx 

PRD ccchhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhccccceeeeeecceeeeeehhhhhh 
15 COILS 



hem 

se<2 elledpnisaslilsiigllscjlavdietrdclflntynlnsvlagvvcrsshtdsvfltjc 

20 SEG xxxx- • -xxxxxxxxxxxxxx 

PRD hhhhccccchhhhhhhhchhhhhhhhhhcccccccccceeeeeeeeeeeccccccchhhh 
COILS 



MEM MMMMMMMMMMMMMMMMM 

25 

SEfl IfiLL<JKLTYNVKIFYSGANIDELITFLIDHI<2SSEDELKMPCLGLLANLCRHNLSV<2THI 

SEG 

PRD hhhhhhhcceeeeeecccchhhhhhhhhhhhhhhhhhhccccceeeeeeeecceeeeeee 
COILS 

30 

MEM 

SE<2 KTLSNVKSFYRTLITLLAHSSLTVVVFALSILSSLTLNEEVGEKLFHARNIH(2TF<2LIFN 

SEG 

35 PRD eeeehhhhhhhhhhhhhhcccccccceeehhhhhhchhhhhhhhhhhhhhhhhhhhhhhh 
COILS 



MEM MMMMMMMMMMMMMMMMM 

40 SE<3 ILINGDGTLTRKYSVDLLMDLLKNPKIADYLTRYEHFSSCLHflVLGLLNGKDPDSSSKVL 

SEG 

PRD hhcccccceeeehhhhhhhhhhccccchhhhhheeeeehhhhhhhhcccccccccchhhh 
COILS 



45 MEM 

SEA ELLLAFCSVT(2LRHMLT(2MMFE<aSPPGSATLGSHTKCLEPTVALLRIiJLS<2PLD6SENCSV 

SEG i 

PRD hhhhhchhhhhhhhhhhhhhhhccccccccccccceeehhhhhhhhhhhcccccccchhh 
50 COILS 



MEM 

SEfi LALELFKEIFEDVIDAANCSSADRFVTLLLPTILD(3L(2FTE(3NLDEALTRKNVKGIAKAI 

55 SEG 

PRD hhhhhhhhhhhhhhhhcccccchhhhhheeehhhhhhhhhhhhhhhhhhhhhchhhhhhh 
COILS 
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hem 

sefl evlltlcgddtlkmhiakilttvkcttlietjfiftygkidlgfgtkvadselcklaadvil 

SEG 

5 PRD hhhhhhccccchhhhhhhhhhhheeeeeeeeeeecccccccccceeehhhhhhhhhhhhh 
COILS 



MEM 

10 SEiJ KTLDLINKLKPLVPGMEVSFYKIL<2DPRLITPLAFALTSDNRE(2Vc2SGLRILLEAAPLPD 
SEG 

PRD hhhhhhhhcccccccccccceeeccccccchhhhhhhccccchhhhhhhhhhhhhccccc 
COILS 



15 MEM 

SE(2 FPALVL6ESIAANNAYR<J<3ETEHIPRKMPWt?SSNHSFPTSIKCLTPHLKDGVPGLNIEEL 
SEG 

PRD cceeeeehhhhhhhhhhhhhhhhhhhhhhhhcccccccchhhhhhhhhhhhhhhhhhhhh 
20 COILS 



MEM 

SEA IEKLflSGMVVKDflICDVRISDIMDVYEMKLSTLASKESRL«2DLLETKALALA(2ADRLIAt3 
25 SEG 

PRD hhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhh 
COILS 



MEM - 

30 

SE(J HRC<2RT<JAETEARTLASf1LREVERKNEELSVLLKA<3flVESERA<2SDIEHLF<3HNRKLESV 

SEG 

PRD hhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhh 
COILS 

35 CCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCC 

MEM 

SE<2 AEEHEILTKSYMELLfiRNESTEKKNKDLflITCDSLNKflIETVKKLNESLKE<3NEKSIA<2L 
SEG 

40 PRD hhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhh 
COILS 

CCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCC 

MEM 

45 SE<2 IEICEE(3RKEVl2NflLVDREHKLANLHt2KTKVi3EE<IKTL(2KEREDKEETIDILRKELSRTE 
SEG 

PRD hhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhh 
COILS 

CCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCC 

50 MEM 

SE<3 (JIRKELSIKASSLEVflKAdLEGRLEEKESLVKLfldEELNKHSHMIAMIHSLSGGKINPET 
SEG 

PRD hhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhcccc 
55 COILS 

CCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCC 

MEM 
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PSDQQBT 



Prosite for DKFZphmel2_12jl • 2 
331->3S3 LEUCINE_ZIPPER PDOCDDOBT 



(No Pfam data available for »KFZphmel2_12jl.2) 
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5 group: intracellular transport and trafficing 

DKFZphmel2_?gm encodes a novel T73 amino acid protein with 
similarity to the dor (deep orange) protein of drosophila 
melanogaster • 

10 

The novel protein is also similar to the vakuolar membrane 
protein pep3 of Saccharomyces cerevisiae-i which is involved in 
protein sorting mechanisms- The expression profile is ubiquitous 
and a role in protein transport/targeting is likely. 

15 

The new protein can find application in modulation of the sorting 
of proteins into different compartments- 



20 similarity to DEEP ORANGE (Drosophila melanogaster) 
perhaps complete cds- and full length 
Sequenced by MediGenomix 

25 

Locus: unknown 

Insert length: 3151 bp 

Poly A stretch at pos- 3fl13-i polyadenylation signal at pos- 367M 

30 

1 GCCCGCGTCA CGGGGGCGGG AGTCAGCTGA GCTGCCGGGG CGAGGTTGGG 
SI ATCACCTGGC ACCGGCTGAA GGGAGCCTGT GATTTTTTTG TAGCGGGGGC 
101 GGGGAGTAAG GTGCAAGACT GCGCCAGATT CAAGGACGAG GGCTGCCCGA 
35 151 TTATCTCGCT GCATAAGGCA AGAGCAAGAG GATCCTCAGG ATTTTAAAGA 
201 GGAGGCGACG GCTGCAGGTT CCCAGGATCT GTCAGAGGCT GGGGAGTTAC 
ESI AGCTTCCATT CTGGGGCGAC GGGGACCCCG GGGGGGTAGC CCTTTTGTAA 
301 TCCCCAGGCC CCGGACAAAG AGCCCAGAGG CCGGGCACCA TGGCGTCCAT 
351 CCTGGATGAG TACGAGAACT CGCTGTCCCG CTCGGCCGTC TTGCAGCCCG 
40 i»01 GCTGCCCTAG CGTGGGCATC CCCCACTCGG GGTATGTGAA TGCCCAGCTG 
i»51 GAGAAGGAAG TGCCCATCTT CACAAAGCAG CGCATTGACT TCACCCCTTC 
501 CGAGCGCATT ACCAGTCTTG TCGTCTCCAG CAATCAGCTG TGCATGAGCC 
551 TGGGCAAGGA TACACTGCTC CGCATTGACT TGGGCAAGGC AAATGAGCCC 
t.01 AACCACGTGG AGCTGGGACG TAAGGATGAC GCAAAAGTTC ACAAGATGTT 
45 LSI CCTTGACCAT ACTGGCTCTC ACCTGCTGAT TGCCCTGAGC AGCACGGAGG 
701 TCCTCTACGT GAACCGAAAT GGACAGAAGG TACGGCCACT AGCACGCTGG 
751 AAGGGGCAGC TGGTGGAGAG TGTGGGTTGG AACAAGGCAC TGGGCACGGA ' 
SOI GAGCAGCACA GGCCCCATCC TGGTCGGGAC TGCCCAAGGC CACATCTTTG 
flSl AAGCAGAGCT CTCAGCCAGC GAAGGTGGGC TTTTCGGCCC TGCTCCGGAT 
50 =101 CTCTACTTCC GCCCATTGTA CGTGCTAAAT GAAGAAGGGG GTCCAGCACC 
151 TGTGTGCTCC CTTGAGGCCG AGCGGGGCCC TGATGGGCGT AGCTTTGTTA 
10D1 TTGCCACCAC TCGGCAGCGC CTCTTCCAGT TCATAGGCCG AGCAGCAGAG 
1051 GGGGCTGAGG CCCAGGGTTT CTCAGGGCTC TTTGCAGCTT ACACGGACCA 
1101 CCCACCCCCA TTCCGTGAGT TTCCCAGCAA CCTGGGCTAC AGTGAGTTGG 
55 1151 CCTTCTACAC CCCCAAGCTG CGCTCCGCAC CCCGGGCCTT CGCCTGGATG 
1201 ATGGGGGATG GTGTGTTGTA TGGGGCATTG GACTGTGGGC GCCCTGACTC 
1251 TCTGCTGAGC GAGGAGCGAG TCTG6GAGTA CCCAGAGGGG GTAGGGCCTG 
1301 GGGCCAGCCC ACCCCTAGCC ATCGTCTTGA CCCAGTTCCA CTTCCTGCTG 
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13S1 CTACTGGCAG ACCGGGTGGA GGCAGTGTGC ACACTGACCG GGCAGGTGGT 
1401 GCTGCGGGAT CACTTCCTGG AGAAATTTGG GCCGCTGAAG CACATGGTGA 
mSl AGGACTCCTC CACAGGCCAG CTGTGGGCCT ACACTGAGCG GGCTGTCTTC 
1SD1 CGCTACCACG TGCAACGGGA GGCCCGAGAT GTCTGGCGCA CCTATCTGGA 
5 1551 CATGAACCGC TTCGATCTGG CCAAAGAGTA TTGT CGAGAG CGGCCCGACT 
IbOl GCCTGGACAC GGTCCTGGCC CGGGAGGCCG ATTTCTGCTT TCGCCAGCGT 
1L.51 CGCTACCTGG AGAGCGCACG CTGCTATGCC CTGACCCAGA GCTACTTTGA 
1701 GGAGATTGCC CTCAAGTTCC TGGAGGCCCG ACAGGAGGAG GCTCTGGCTG 
1751 AGTTCCTGCA GCGAAAACTG GCCAGTTTGA AGCCAGCCGA ACGTACCCAG 

10 IflDl GCCACACT6C TGACCACCTG GCTGACAGAG CTCTACCTGA GCCGGCTTGG 
IflSl GGCTCTGCAG GGCGACCCAG AGGCCCTGAC TCTCTACCGA GAAACCAAGG 
1101 AATGCTTTCG AACCTTCCTC AGCAGCCCCC GCCACAAAGA GTGGCTCTTT 
1151 GCCAGCCGGG CCTCTATCCA TGAGCTGCTC GCCAGTCATG GGGACACAGA 
2D01 ACACATGGTG TACTTTGCAG TGATCATGCA GGACTATGAG CGGGTGGTGG 

15 2051 CTTACCACTG TCAGCACGAG GCCTACGAGG AGGCCCTGGC CGTGCTCGCC 
2101 CGCCACCGTG ACCCCCAGCT CTTCTACAAG TTCTCACCCA TCCTCATCCG 
2151 TCACATCCCC CGCCAGCTTG TAGATGCCTG GATTGAGAT6 GGCAGCCGGC 
2201 TGGATGCTCG TCAGCTCATT CCTGCCCTGG TGAACTACAG CCAGGGTGGT 
2251 GAGGTCCAGC AGGTGAGCCA GGCCATCCGC TACATGGAGT TCTGCGTGAA 

20 2301 CGTGCTGGGG GAGACTGA6C AGGCCATCCA CAACTACCTG CTGTCACTGT 
2351 ATGCCCGTGG CCGGCCGGAC TCACTACTGG CCTATCTGGA GCAGGCTGGG 
2401 GCCAGCCCCC ACCGGGTGCA TTACGACCTC AAGTATGCGC TGCGGCTCTG 
2451 C6CCGAGCAT GGCCACCACC GCGCTTGTGT CCATGTCTAC AAGGTCCTAG 
2501 AGCTGTATGA GGAGGCCGTG GACCTGGCCC TGCAGGTGGA TGTGGACCTG' 

25 2551 GCCAAGCAGT GTGCAGACCT GCCTGAGGAG GATGAGGAAT TGCGCAAGAA 
21,01 GCTGTGGCTG AAGATC6CAC GGCAC6TGGT GCAGGAAGAG GAAGATGTAC 
2b51 AGACAGCCAT GGCTTGCCTG GCTAGCTGCC CCTTGCTCAA GATTGAGGAT 
2701 GTGCTGCCCT TCTTTCCTGA TTTCGTCACC ATCGACCACT TCAAGGAGGC 
2751 GATCTGCAGC TCACTTAAGG CCTACAACCA CCACATCCAG GAGCTGCAGC 

30 2A01 GGGAGATGGA AGAGGCTACA GCCAGTGCCC AGCGCATCCG GCGAGACCTG 
2351 CAGGAGCTGC GGGGCCGCTA CGGCACTGTG GAGCCCCAGG ACAAATGTGC 
2101 CACCTGCGAC TTCCCCCTGC TCAACCGCCC TTTTTACCTC TTCCTCTGTG 
2151 GCCATATGTT CCATGCTGAC TGCCTGCTGC AGGCTGTGCG ACCTGGCCT6 
3001 CCAGCCTACA AGCAGGCCCG GCTGGAGGAG CTGCAGAGGA AGCTGGGGGC 

35 3051 TGCTCCACCC CCAGCCAAGG GCTCTGCCCG GGCCAAGGAG GCCGAGGGTG 
3101 GGGCTGCCAC GGCAGGGCCC AGCCGGGAAC AGCTCAAGGC TGACCTGGAT 
3151 GAGTTGGTGG CCGCTGAGTG TGTGTACTGT GGGGAGCTGA TGATCCGCTC 
3201 TATCGACCGG CCGTTCATCG ACCCCCAGCG CTACGAGGAG GAGCAGCTCA 
3251 GTTGGCTGTA GGAGGGTGTC ACCTTTGATG GGGGTGGGCA ATGGGGAGCA 

40 3301 GTGGCTTGAA CCCACTTGAG AAGGCTGCCT CCTAGGCTCT GCTCAGTCAT 
3351 CTTGCAATTG CCACACTGTG ACCACGTTGA CGGGAGTAGA GTAGCGCTGT 
3401 TGGCCAG6AG GTGTCAGGTG TGAGTGTATT CTGCCAGCTT TTCATGCTGT 
3451 TCTTCAGAGC TGCAGTTATG CCAGACCATC AGCCTGCCTC CCAGTAGAGG 
3501 CCCTTCACCT GGAGAAGTCA GAAATCTGAC CCAATTCCAC CCCCTGCCTC 

45 3551 TAGCACCTCT TCTGTCCCTG TCATTCCCCA CACACGTCCT GTTCACCTCG 
3t01 AGAGAGAGAG AGAGAGAGCA CCTTTCTTCC GTCTGTTCAC TCTGCGGCCT 
3bSl CTGGAATCCC AGCTCTTCTC TCTCAGAAGA AGCCTTCTCT TCCTCCTGCC 
3701 TGTAGGTGTC CCAGAAGTGA GAAGGCAGCC TTCGAAGTCC TGGGCATTGG 
3751 GTGAGAAAGT GATGCTAGTT GGGGCATGCT TTTGTGCACA CTCTCTGGGG 

50 3601 CTCCAGTGTG AAGGGTGCCC TGGGGCTGAG GGCCTTGTGG AGGATGGTCG 
3B51 GTGGTGGTGA TGGAGGTGGA GAGCATTAAA CTGTCTGCAC TGCAAAAAAA 
3101 AAAAAAAAAA AAAAAAAAAA AAAAAAAAAA AAAGAAAAAA AAAAAAAAAA 
3151 A 

55 

BLAST Results 
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Medline entries 

5 — 

^7ElflD37: 

Shestopal SAi Makunin IVt Belyaeva ESt Ashburner V\i Zhimulev 
IF.iOol 

10 Gen Genet 1117 Feb 20i253(5) :fc,M2-fl 
^2D^3Db: 

Robinson JSn Graham TRt Emr SD-i A putative zinc finger protein-. 
Saccharomyces cerevisiae 
15 Vpslfipi affects late Golgi functions required for 
vacuolar protein sorting and efficient alpha-factor 
prohormone maturation. Hoi Cell Biol 1111 Decill(12) :5fil3-2M 

120M13DS: 

20 Preston RAi Manolson MFi Becherer K-> Ueidenhammer E-i Kirkpatrick 
lilright Ri 

Jones Elil-i Isolation and characterization of PEP3i a gene 
required 

25 for vacuolar biogenesis in Saccharomyces cerevisiae- Mol Cell 
Biol Hll 

DecUM12) :5AD1-12 



30 

Peptide information for frame 1 



35 0RF from 3**U bp to 32Sfl bpi peptide length: 173 
Category: similarity to known protein 
Classification: Cellular transport and traffic 

1 MASILDEYEN SLSRSAVLflP GCPSVGIPHS GYVNAflLEKE VPIFTKfiRID 

40 51 FTPSERITSL VVSSNflLCMS LGKDTLLRID LGKANEPNHV ELGRKDDAKV 

101 HKHFLDHTGS HLLIALSSTE VLYVNRNGUK VRPLARUKGfl LVESVGUNKA 

151 LGTESSTGPI LVGTACJGHIF EAELSASEGG LFGPAPDLYF RPLYVLNEEG 

2D1 GPAPVCSLEA ERGPDGRSFV IATTRflRLFfl FIGRAAEGAE AfiGFSGLFAA 

251 YTDHPPPFRE FPSNLGYSEL AFYTPKLRSA PRAFAUMflGD GVLYGALDCG 

45 3D1 RPDSLLSEER VWEYPEGVGP GASPPLAIVL TflFHFLLLLA DRVEAVCTLT 

351 GflVVLRDHFL EKFGPLKW1V KDSSTG(2LUA YTERAVFRYH VOREARDVUR 

MQ1 TYLDHNRFDL AKEYCRERPD CLDTVLAREA DFCFRflRRYL ESARCYALTfl 

MSI SYFEEIALKF LEARflEEALA EFLflRKLASL KPAERTiJATL LTTULTELYL 

SD1 SRLGALflGDP EALTLYRETK ECFRTFLSSP RHKEULFASR ASIHELLASH 

50 551 GDTEHI1VYFA VIllflDYERVV AYHCflHEAYE EALAVLARHR DPfiLFYKFSP 

bOl ILIRHIPRflL VDAUIEUGSR LDARlJLIPAL VNYS(2GGEV(2 <JVS(2AIRYI1E 

bSl FCVNVLGETE (3AIHNYLLSL YARGRPDSLL AYLEflAGASP HRVHYDLKYA 

7D1 LRLCAEHGHH RACVHVYKVL ELYEEAVDLA LflVDVDLAKfl CADLPEEDEE 

751 LRKKLULKIA RHVVflEEEDV (3TAHACLASC PLLKIEDVLP FFPDFVTIDH 

55 BD1 FKEAICSSLK AYNHHIflELfl REHEEATASA fiRIRRDLflEL RGRYGTVEPfl 

851 DKCATCDFPL LNRPFYLFLC GHMFHADCLL flAVRPGLPAY K(JARLEEL(2R 

1D1 KLGAAPPPAK GSARAKEAEG GAATAGPSRE (JLKADLDELV AAECVYCGEL 

151 fllRSIDRPFI DPflRYEEEflL SUL 
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BLASTP hits 

5 

No BLASTP hits available 

Alert BLASTP hits for DKFZphmel2_7gl4 -, frame 1 
10 SUISSPR0T:D0R_DR0I1E DEEP ORANGE PROTEIN--, N = li Score = 127T-, P 
2-4e-13D 

PIR:Ain43 vacuolar membrane protein PEP3 - yeast (Saccharomyces 
15 cerevisiae) -, N * 3n Score = 2bb-> P = S-le-27 



>SUISSPR0T:D0R_DR0NE DEEP ORANGE PROTEIN- 
Length = 1-iODB 

20 

HSPs: 

Score = 1271 (l^l-T bits)-, Expect = 2-4e-13D-, P = 2.4e-13D 

Identities = 303/SM7 (35'/.) i Positives = 4b3/6M7 ( 54/C) 

fluery: 13D 

KVRPLARUKGflLVESVGUNKALGTESSTGPILVGTAdJGHIFEAELSASEGGLFGPAPDLY Ifll 
KVR + ++K + +V +N G ESSTGPIL+GT++G IFE EL+ + G 

+ 

Sbjct: 155 KVRRIEKFKDHEITAVAFNPYHGNESSTGPILLGTSRGLIFETELNPAADG- 
HVfl 2DA 

fluery: 110 FRPLYVLNEEGGPA-PVCSLEAERGPDG- 
RSF VI A TTRflRLFflFIGR A AEG AEAflGFSGL 247 
35 + LY L G P P+ L+ R P+ R ++ T+ + ++ F + 

AE + + 

Sbjct: 20=1 RKdLYDLGL-GRPKYPITGLKLLRVPNSSRYIIVVTSPECIYTF" 
I2ETLKAEERSL(2AI 2b5 

(Juery: 24fi FAAYTD— 

HPPPFREFPSNLGYSELAFYTPKLRSAPRAFAUMMGDGVLYGAL — DCGRPD 303 

FA Y P E ++L +S+L F+ P P+ +AU+ G+G+ G L 

+ 

Sbjct: 2bb 

FAGYVSGVCEPHCEERKTDLTFSflLRFFAPPNSKYPKflUAULCGEGIRVGELSIEANSAA 325 

iJuery: 304 SLLSEERV— UEYPEGVGPGA— 
SPPLAIVLT(3FHFLLLLADRVEAVCTLTGl3VVLRD 357 

+L+ + +E + G + P A VLT++H +LL AD V A+C L 

+ V ++ 

Sbjct: 32b 

TLIGNTLINLDFEKTMHLSYGERRLNTPKAFVLTEYHAVLLYADHVRAICLLN<2E<3VY<2E 3flS 

fluery: 35fl HFLE- 
55 KFGPLKHHVKDSSTG(2LlilAYTERAVFRYHV(3REARDVLIRTYLDI1NRFDLAKEYCR 41b 

F E + G + +D TG ++ YT + VF V RE R+VUR YLD 

+++LA + 



25 



30 



40 



45 



50 



-256- 



WO 01/98454 



PCT/EBOl/02050 



Sbjct: 36b 

AFDEARVGKPLSIERDELTGSIYVYTVKTVFNLRVTREERNVIilRIYLDKGCJYELATAHAA 115 
fluery: 117 

5 ERPDCLDTVLAREADFCFRURRYLESARCYALT(2SYFEEIALKFLEAR(3EEALAEFL(JRK 17b 

E P+ L VL + AD F Y +A YA T FEE+ LKF+ + 

+ +++++ 
Sbjct: lib 

EDPEHLflLVLCflRADAAFADGSYflVAADYYAETDKSFEEVCLKFMVLPDKRPIINYVKKR SQS 

10 

fluery: 177 LASL--KPAERXXXXXXXXXXXXXXXSRLGAL<2 

GDPEALTLYRETKEC-FRTFLSS SET 

L+ + KP E L L P+ +R + + 

+ F+ 
15 Sbjct: SDb 

LSRVTTKPIIETDELDEDKIINIIKALVIULIDLYLIfllNIIPDKDEEURSSUtJTEYDEFIIIIE Sb5 
(Juery: S3D 

PRHKEWLFASRASIHELLASHGDTEHnVYFAVIMflDYERVVAYHCflHEAYEEALAVLARH Sfll 
20 +R ++ +L+A H D +f1 FA+ + DY+ VVA + E Y 

EAL L 

Sbjct: Sbb 

AHVLSCTR<aNRETVRflLIAEHADPRNMA<2FAIAIGDYDEVVA<2<2LKAECYAEAL(2TLINC2 b2S 

25 fluery: STO 

RDP(2LFYKFSPILIRHIPR(3LVl>AUIEI1GSRLDARaLIPALVNYS(JGGEV(3(2VS(2AIRYn bn 
R+P+LFYK++P LI +P+ VDA + GSRL+ +L+P L+ + E ++ 

+<3 RY + 

Sbjct: bEb RNPELFYKYAPELITRLPKPTVDALHA(2GSRLEVEKLVPTLI- 
30 II"1ENRE<2RE<2T<3 — RYL bfiS 

Query: bSO 

EFCVNVLGETE(2AIHNYLLSLYARGRPDSLLAYLE(2AGASPHRVHYI>LKYALRLCAEHGH 701 
EF + L T AIHN+LL LYA P L+ YLE G VHYD+ YA 

35 ++C + 

Sbjct: bfi3 

EFAIYKLNTTNDAIHNFLLHLYAEHEPKLLMKYLEIflGRDESLVHYDIYYAHKVCTDLDV 712 
fluery: 710 

40 HRACVHVYKVLELYEEAVDLALtWDVDLAKfiCADLPEEDEELRKKLULKIARHVVciEEED 7bT 

A V + +L + AVDLAL D+ LAK+ A P D ++R+KLUL+IA 

H ++ » 

Sbjct: 713 KEARVFLECMLRKUISAVDLALTFDHKLAKETASRPS- 
DSKIRRKLblLRIAYHDIKGTND fidl 

45 

fluery: 77D 

VUTAI1ACLASCPLLKIE]>VLPFFPDFVTIDHFKEAICSSLKAYNHHI(3EL(JREI1EEATAS S2T 
V+ A+ L C LL+IED+LPFF DF ID+FKEAIC +L+ YN 

K3EL(JREPI E T 
50 Sbjct: flD2 

VKKALNLLKECDLLRIEDLLPFFADFEKIDNFKEAICDALRDYN(2RIfiEL(3REI1AETTE(3 fibl 

fluery: 830 

AURIRRDLfiELRGRYGTVEPflDKCATCDFPLLNRPFYLFLCGHMFHADCLLiaAVRPGLPA flfll 
55 R +LA+LR TVE dT> C C+ LL +PF++F+CGH FH+DCL + 

V P L 

Sbjct: flb2 

TDRATAEL(J(3LR(3HSLTVESODTCEICEt1t1LLVKPFFIFICGHKFHSI>CLEKHVVPLLTK =121 
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fiuery: fiTD 

YKflARLEELflRKLGAAPPPXXXXXXXXXXXXXXXXXXPSREtJLKADLDELVAAECVYCGE 
+ RL L+++L A R LK 

5 ++++++AA+C++CG 
Sbjct: 122 

EflCRRLGTLK(2(3LEAEV(3T(2A(2P(JSGALSK(2l3AriELl3RKRAALKTEIE]>ILAA»CLFCG- TflO 

fluery: ISO LMIRSIDRPFIDP<2RYEEEi2LSU 172 
10 L+I +ID+PF+D +E+ + U 

Sbjct: ^fll LLISTIDflPFVDD — UE(2VNVELJ 1001 

Score = 2bfl (HQ. 2 bits)i Expect = 3-be-ll-. P = 3-be-ll 
Identities = 11/261 (32X)i Positives = lMb/281 (515c) 

15 

fluery: 3b (JLEKEVPIFTKURIDF-TPSE— RITSLVVSSNflLCMSLG— - 
KDTLLRIDLGKANEPN flfi 

+ ++E IF++ ++ PS + L VS N L LG + TLLR 

L +A P 
20 Sbjct: 37 

ETDEEDEIFSRHKnVLRVPSNCTGDLMHLAVSRNblLVCLLGTPERTTLLRFFLPRAIPPG lb 

<2uery: fll HVELGRK DDAKVHKMFLDHTGSHLLIAL SST EVLYVN-- 

RNGfl KV 131 

25 L + K+ +f1FLD TG H++IAL S+T + LY++ + 

a KV 
Sbjct: =1? 

EAVLEKYLSGSGYKITRMFLDPTGHHIIIALVPKSATAGVSPDFLYIHCLESPiJAdJflLKV 15b 
30 duery: 132 

RPLARWKGflLVESVGUNKALGTESSTGPILVGTAflGHIFEAELSASEGGLFGPAPDLYFR 111 
R + ++K + +V +N G ESSTGPIL+GT++G IFE EL+ + G 

+ + 

Sbjct: 157 RRIEKFKDHEITAVAFNPYHGNESSTGPILLGTSRGLIFETELNPAADG 

35 HVtJRK 210 

fluery: 112 PLYVLNEEGGPA-PVCSLEAERGPDG- 
RSFVIATTRtJRLFflFIGRAAEGAEAflGFSGLFA 2m 

LY L G P P+ L+ R P+ R ++ T+ + ++ F + AE 

40 + +FA 

Sbjct: ail (3LYDLGL-GRPKYPITGLKLLRVPNSSRYIIVVTSPECIYTF-- 
<2ETLKAEERSL<3AIFA 2b7 

fluery: 25D AYTD--HPPPFREFPSNLGYSELAFYTPKLRSAPRAFAUI1I1GDGVLYGAL 
45 217 

Y P E ++L +S+L F+ P P+ +AU+ G+G+ G L 

Sbjct: Ebfl GYVSGVflEPHCEERKTDLTFSflLRFFAPPNSKYPKtJlilAWLCGEGIRVGEL 



317 



50 



Pedant information for DKFZphmel2_7gl4 -, frame 1 



Report for DKFZphmel2_7gl4 • 1 

55 

CLENGTHJ 173 

en iii i iioiflb.m 
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10 



15 



Ipll S-72 

CH0M0L]] SUISSPR0T:D0R_DR0I1E DEEP ORANGE PROTEIN • le-mS 

EFUNCAT3 3D-S5 vacuolar and lysosomal organization IS- 
cerevisiae-. YLRlMflwJ Se-Ml 

EFUNCAT3 Db-DM protein targeting! sorting and translocation 

ES. cerevisiae-. YLRlMflwJ 5e-Ml 
EFUNCATI Ofl-07 vesicular transport (golgi network-, etc.) IS- 
cerevisiae-. YLRlMflwJ Se-Ml 
[BLOCKS! BLDOlObF Galactokinase proteins 
EBLOCKSJ PROIDTMB 
IBL0CKS3 BPD330LB 
IBL0CKS2 PFQObOOB 
EPIRKUJ yeast vacuole le-3T 

CPIRKU3 transmembrane protein le-3T 

EKIO Alpha_Beta 
EKIO L0lil_C0l1PLEXITY 3-3T X 

CKIO C0ILED_C0IL M-A3 V. 



20 SEfl MASILDEYENSLSRSAVLflPGCPSVGIPHSGYVNAflLEKEVPIFTKflRIDFTPSERITSL 

SEG 

PRD ccceeeccccccceeeeecccccceeeecccchhhhhhhhhhhhhhhhhhcccccceeee 
COILS 



25 

SE(2 VVSSNflLCMSLGKDTLLRIDLGKANEPNHVELGRKDDAKVHKflFLDHTGSHLLIALSSTE 

SEG 

PRD eeccceeeeecccccceeeccccccccceeeeehhhhhhhheeecccccceeeeeeccce 
COILS 

30 

SEfl VLYVNRNGflKVRPLARUKGflLVESVGUNKALGTESSTGPILVGTAflGHIFEAELSASEGG 

SEG 

PRD eeeeecccccchhhhhcccceeeeeecccccccccccceeeeecccchhhhhhhhhhccc 
35 COILS 



SEfl LFGPAPDLYFRPLYVLNEEGGPAPVCSLEAERGPDGRSFVIATTRflRLFflFIGRAAEGAE 

SEG 

40 PRD ccccccccccceeeeecccccccceeecccccccccceeeeeehhhhhhhhhhcchhhhh 
COILS 



SEfl AflGFSGLFAAYTDHPPPFREFPSNLGYSELAFYTPKLRSAPRAFAblMIIGDGVLYGALDCG 

45 SEG 

PRD hhhchhhhhhhhccccccccccccccccceeeecccccchhhhhhhhcccceeeeeeccc 
COILS 



50 SEfl RPDSLLSEERVUEYPEGVGPGASPPLAIVLTflFHFLLLLADRVEAVCTLTGflVVLRDHFL 

SEG 

PRD cccccchhhhhhccccccccccccchhhhhhhhhhhhhhhhheeeecccchhhhhhhhhh 
COILS 



55 

SEfl EKFGPLKHMVKDSSTGflLUAYTERAVFRYHVflREARDVURTYLDflNRFDLAKEYCRERPD 

SEG 

PRD hcccccccccccccccceeeehhhhhhhhhhhhhhhhhhhhhhccchhhhhhhhhhhccc 
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SEA CLDTVLAREADFCFR(2RRYLESARCYALTi2SYFEEIALKFLEAR(2EEALAEFL<2RKLASL 

5 SEG 

PRD cchhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhc 
COILS 



10 SE(J KPAERT(3ATLLTTIilLTELYLSRLGAL(3GDPEALTLYRET<ECFRTFLSSPRHKEULFASR 

SEG xxxxxxxxxxxxxxx 

PRD ccchhhhhhhhhhhhhhhhhhhhhcccccchhhhhhhhhhhhhhhhhhhhhhhhhhhhhh 
COILS 



15 

SE<3 ASIHELLASHGDTEHI1VYFAVII1<2DYERVVAYHCt2HEAYEEALAVLARHRDP<2LFYKFSP 

SEG , 

PRD hhhhhhhhhccchhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhccchhhhhcce 
COILS 



SEfl ILIRHIPR<2LVDAWIEMGSRLDAR(3LIPALVNYS(2GGEVt2<2VS{3AIRY!1EFCVNVLGETE 

SEG 

PRD eeeeccccchhhhhhhhccccccccccchhhhhccccchhhhhhhhhhhhhhhhccccch 
25 COILS 



SEC (3AIHNYLLSLYARGRPDSLLAYLEAAGASPHRVHYDLKYALRLCAEHGHHRACVHVYKVL 

SEG 

30 PRD hhhhhhhhhhhhhcccchhhhhhhhcccccccccchhhhhhhhhhhhcccccceeehhhh 
COILS 



SEC ELYEEAVDLALdVDVDLAKflCADLPEEDEELRKKLIilLKIARHVVfiEEEDVlJTAIIACLASC 
35 SEG 

PRD hhhhhhhhhhhhhchhhhhhhhhccccchhhhhhhhhhhhhhhhhhcchhhhhhhhhhhc 
COILS 



40 SE<2 PLLKIEDVLPFFPDFVTIDHFKEAICSSLKAYNHHI(3EL<2REI1EEATASA(2RIRRDL<2EL 
SEG 

PRD ccchhhhhhcccccceeechhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhh 
COILS 

CCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCC 

45 

SE<2 RGRYGTVEP<2DKCATCDFPLLNRPFYLFLCGHMFHADCLL(3AVRPGLPAYKfiARLEEL(3R 
SEG 

PRD hhhheeeeccccccccccccccceeeeeeeccchhhhhhhhhhhccchhhhhhhhhhhhh 
COILS 

50 CCCCCCCC 

SEfl KLGAAPPPAKGSARAKEAEGGAATAGPSRECJLKADLDELVAAECVYCGELMIRSIDRPFI 
SEG xxxxxxxxxxxxxxxxxx 

PRD hhhhhcchhhhhhhhhhhcccchhhhhhhhhhhhhhhhhhhhhhhccccceeeecccccc 
55 COILS 



SE<3 DPdRYEEEdLSUL 
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SEG 

PRD chhhhhhhhhccc 
COILS 



(No Prosite data available for DKFZphmel2_7gm • 1) 
(No Pfam data available for DKFZphmelH_7gm . 1 ) 
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5 group: melanoma derived 

DKFZphmel2_7kl E l encodes a novel 234 amino acid protein without 
similarity to known proteins- 

10 Transcpripts can be found in almost any tissue-i but are most 
abundant in kidney and retina- 
No informative BLAST results} No predictive prosite-. pfam or SCOP 
motif e • 

15 The new protein can find application in studying the expression 
profile of melanoma-specific genes- 



unknown protein 

20 

first ATG in frame 1 
Sequenced by MediGenomix 
25 Locus: /map="3" 

Insert length: 23flb bp 

Poly A stretch at pos- 2343-. polyadeny lation signal at pos- 2323 

30 

1 GGCAAAAGTC CAGGAATTAT CTTCATCCCT GGCTATCTTT CTTATATGAA 

51 TGGTACAAAA GCGTTGGCGA TTGAGGAGTT TTGCAAATCT CTAGGTCACG 

101 CCTGCATAAG GTTTGATTAC TCAGGAGTTG GAAGTTCAGA TGGTAACTCA 

151 GAGGAAAGCA CACTGGGGAA ATGGAGAAAA GATGTTCTTT CTATAATTGA 

35 201 TGACTTAGCT GATGGGCCAC AGATTCTTGT TGGATCTAGC CTTGGAGGGT 

251 GGCTTATGCT TCATGCTGCA ATTGCACGAC CAGAGAAG6T T6TGGCTCTT 

3D1 ATTGGTGTAG CTACAGCT6C AGATACCTTA GTGACAAAGT TTAATCAGCT 

351 TCCTGTTGAG CTAAAAAAGG AAGTAGAGAT GAAAGGTGTG TGGAGCATGC 

401 CATCAAAATA CTCTGAAGAA GGAGTTTATA ACGTTCAGTA CAGTTTCATT 

40 451 AAAGAAGCTG AACATCACTG CTTGTTACAT AGCCCAATTC CTGTGAACTG 

501 CCCCATAAGA TTGCTCCATG GCATGAAGGA TGACATTGTA CCTTGGCATA 

551 CATCAATGCA GGTTGCCGAT CGAGTACTCA GCACAGATGT GGATGTCATC 

t.01 CTCCGAAAAC ACAGTGATCA CCGAATGAGG GAAAAAGCAG ACATTCAACT 

b51 TCTTGTTTAC ACTATTGATG ACTTAATTGA TAAGCTCTCA ACTATAGTTA 

45 701 ACTAGTATCA CATGTTTAGT TGGTATGTAA ACTAATGTAT CCAGAAGATT 

751 GGAAGAGGGA TAAGAAATGA AAGATCCTGA TACTTTAGGT TTTTCCCTTT 

fiOl CCTCTATTTT GTAAATATAA GATGAGTATT ATTTAATGAT GTATTTGCAT 

fi51 AAGTAATGCA AATTGTGAAG AAGGACCAGC TGCTGTTTAG AAAATTTTCT 

101 CCTTCCTTCT GTCCTTGATT TTTTTTCATT AAAGTATTTC CTTTTTTTAA 

50 ^51 TTCAAGAAAA GTTTACCTTT CTTATGCTTA TGTTAGCTAT GCCAGCTCTT 

1001 AATTGCATCC TTTTCTAATT AGGATTATTA ATAAAGCGTG AATATTTTGT 

1051 TTTTTATTAT AGACAGAAAT TTGTAACATT ACTTCTGATT TGAAAATGCA 

1101 ATTCACAAAA TATAGGGAAA TTTTTATTGA AGTAAATTTG AAATGATGGA 

1151 GAAATTTCAG AAGCATAATA AAGTTCACAA TAAGGATAAT ACTTTATATA 

55 1201 ATGTATAAAG TATATATAAT ATAATATATA TGTTATATAA ACTGCACATT 

1251 ATATTCAAAC TTAAAATTGA GCTTTTTTTT TAAAGGCCCA AAATTGTACA 

1301 GTGATACAAG GAGCTATTTC TAAAATTTGG CTTATGTATA ATATATTTAA 

1351 ATGGGGAATT TCATCTAAAA CAATGATGTA GTATTTTTAA TATTCTGATT 
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mtJl GGTAAAATTA AAGAGGAAAT TAATCTTTAT ATATTATTTC TTGCAGAAAC 

mSl ATTCATTATT TTATTAATAT TGCCCTAAGT ACAACTAGGC AAGTGATTGC 

1SD1 CACCTAAATC AGAAGACGTT CTAAAGTCAG TAAGAAAGTG TGAAATGCTA 

1551 GTATAAAGGT TATTTTTTTT CTTTCCTAAA TAACTAAAGT GAGGTGTAGA 

5 IbQl TTGAGCCTTG ATATTATTTA GTTAATGTTT TTTATTAATT AATTTTGGCT 

IbSl GGACTTTATT TAGCTTGATT AGGTTATTAT CTGTCAAACC TTTTAAGTTG 

17D1 ACAACATGAC TCATATATAT ACATGTGTAT AAGATGAGCA TGTGTCGAAG 

1751 ACTTATTCGA CTCATTAATG AGGAAACCAG CAGATAGTAA ACCTGGTTCA 

16D1 AAGTACAATT CAAGAAACTG AGTATTTATG GGCATTGAAG AAAAAATGTT 

10 IflSl GAGATAAAAT TGCTGTGCAG AAAAAAGTGT TAATGAAGCC GACCTGACTA 

MD1 CTTAACCTTA GAGACCTGCT TTACAAGGTT GGCCCTTGAT TGGCATCTGG 

nSl GAACTTGGAG TTCAGGGGGC TTCCACCATT CCCAGAACTG ATCAAAGTAG 

2DD1 CTTACTATAT CTAAACTGTA AAACAATATA GTTTCTCCTG AACACCTGCT 

2D51 TTCCTTCTGG GAGTCTGGAA TTTTGGTATG TGCCAGGCAG AGACTACCTT 

15 E101 TGTGACCAGC TCCCAGTAAA AACCCCAGGC ACTCAGTCTC TAACAAGCTT 

E151 TTCTGGTTGA CAGTGTTTCA CAAGTGCTGT TACAACTGGT TGCTGGGAGA 

22D1 ATTAAGCTCA TCCTCTGTGA TTCCACTGGC GGAGGATTCT TGGAAGCTTG 

2251 CACTTAGTTT CCCCTGACTT CACCCCATGT GTCTTTTTTC CTTTGCTGAT 

23D1 TTTGTTTTGT ATCCTTTCAC TGTAATAAAT CATGGCCGTG AGCAGAAAAA 

20 2351 AAAAAAAAAA AAAAAAAAAT AAAAAAAAAA AAAAAA 



BLAST Results 



25 

No BLAST result 



Medline entries 

30 

No Medline entry 



35 

Peptide information for frame 1 



ORF from Mb bp to 7D2 bpn peptide length: Ell 
40 Category: similarity to unknown protein 
Classification: unclassified 

1 NNGTKALAIE EFCKSLGHAC IRFDYSGVGS SDGNSEESTL GKIilRKDVLSI 

51 IDDLADGPfil LVGSSLGGUL MLHAAIARPE KVVALIGVAT AADTLVTKFN 

45 101 (2LPVELKKEV EMKGVUSMPS KYSEEGVYNV (2YSFIKEAEH HCLLHSPIPV 

151 NCPIRLLHGM KDDIVPUHTS IK2VADRVLST DVDVILRKHS DHRI1REKADI 

201 ULLVYTIDDL IDKLSTIVN 



50 

BLASTP hits 

No BLASTP hits available 
55 Alert BLASTP hits for DKFZphmel2_7kn-. frame 1 

No Alert BLASTP hits found 



-263- 



WO 01/98454 PCT/IB01/02050 
Pedant information for DKFZphmel2_?kl c l •> frame 1 



Report for DKFZphmelS_?kn . 1 

5 

ELENGTHJ SIT 
EMIO EMSOT-lfl 
EpI3 S-bl 

10 EH0M0L3 PIR:A71b«ll hypothetical protein RP343 - Rickettsia 

prowazekii Be-ST 
EBLOCKSJ BP0M3SSK 
EBLOCKSJ PRQDflEfiE 
CKIill Alpha_Beta 

15 

SE<3 NNGTKALAIEEFCKSLGHACIRFDYSGVGSSDGNSEESTLGKWRKDVLSIIDDLADGPfll 
PRD ccchhhhhhhhhhhhccceeeeeeccccccccccccccccchhhhhhhhhhhhhccccee 

20 SE(2 LVGSSLGGULflLHAAIARPEKVVALIGVATAADTLVTKFNCLPVELKKEVEIIKGVlilSriPS 
PRD eeecccchhhhhhhhhhccceeeeeeeeeehhhhhhcccccchhhhhhhhhhhheeeccc 

SE<J KYSEEGVYNVfiYSFIKEAEHHCLLHSPIPVNCPIRLLHGHKDDIVPbJHTSnflVADRVLST 
PRI> ccccccceeeehhhhhhhhhhhhhhhccccccceeecccccccccccchhhhhhhhhhhh 



25 



SE<2 DVDVILRKHSDHRMREKADICJLLVYTIDDLIDKLSTIVN 
PRD hheeeeeccccchhhhhhhheeeeehhhhhhhhcccccc 



30 (No Prosite data available for DKFZphmelB_7kn . 1) 
(No Pfam data available for DKFZphmelS.Vkll. 1) 



35 DKFZphtes3_10ilt. 



group: nucleic acid management 

40 

DKFZphtes3_10ilb encodes a novel 742 amino acid protein with 
similarity to human ZK1- 

The ZK1 gene is one of early response genes by exposure to 
45 ionizing radiation-i and plays a role in radiation-induced 

apoptotic cell death on hematopoietic cells. The novel protein 
contains Ifl zinc finger domains-i a RGD cell attachment and a ATP 
GTP A domain- 

50 The new protein can find application in diagnosis/therapy in 

leukemia predisposition/disease in the modulation of DNA repair. 

similarity to ZK1 (Homo sapiens) i complete cds- 

55 

Sequenced by fiiagen 
Locus: unknown 
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Insert length: 2&B l \ bp 

Poly A stretch at pos- Sflbl-. polyadenylation signal at pos- 2fl35 



5 

1 CGGAAATGGA GGGGGTCGCT 
51 GTTGGTAACC GGTCAGACCA 
101 GCTTCTGTCG CTCTGTCGCC 
151 TGTAGAGAGG ACCCCGGTAC 
10 EDI CTTTGAGGAT GTGGCTGTGA 
251 ATATTTCCCA GAAGAATCTC 
301 AACCTGACCT CTATAGGAAA 
351 GTACCAAAAC CCCAGAAGAA 
M01 ATGAAATTAA AGAAGACAGT 
15 M51 GATGACAGAC TGAACTTCCA 
501 ATGTGACAGC TTTGTGTGTG 
551 ATATGAGCAT CAGAGGTGAC 
fc.01 TATGGACCAA AGCCATATAA 
b51 CAGGTATCGC CCATCCATTA 
20 701 AACCCTATGC TTGTAAAGTC 
751 ATTCGAAGAC ACATGGTAAT 
fiOl ATTTTGTGGG AAAGCCTTCC 
fiSl GAACTCACAC TGGAGAGAAA 
=101 TTTACTTATT CTGCTACCCT 
25 =151 GAAGCCCTAT GAATGTAGCA 
1001 CCTATCATAG ACATGAAAGA 
1051 AAAGAATGTG GAAAAGCATT 
11D1 AAGGACCCAC TCTGGGAAAA 
1151 GCTTATCCTA TCTTATAAGT 
30 1201 GAAAGACCTT ATAAATGTAA 
1251 GTCATTTCAA ACACATGAAA 
1301 GCAAGCAATG TGGTAAAGCC 
1351 GAAAGGATTC ACACTGGAGA 
mOl AGCCTTCAGA TCTGCCTCAC 
35 m51 GAGAGAAACC CTATGAATGT 
1501 TCACACCTTC GAGTGCATGG 
1551 ATGTAAGGAA TGTGGGAAAG 
IbDl ATGAAAGGAC AGAAAAACAC 
lb51 AAATGTAGTA TATGTGAGAA 
40 1701 ACATGAAAAA ACTCACACTG 
1751 GTAAAGCCTT CAGATGTTGC 
IflOl ACTGGAGAGA AACCCTATGA 
1351 TGCCTCACAC CTTCGAATGC 
nOl ATGAGTGTAA GCAATGTGGG 
45 1151 AAGCATGGTA GGACTCACAC 
2001 TGGGAAAGCC TTCAGATCTG 
2051 ACACTGGAGA GAAACCCTAT 
5101 AAATTCTCTT CTTTTCAAAT 
2151 CTATGAATGT AAGCATTGTG 
50 2201 AAATACATGC AAGAACACAC 
2251 TGCGGAAAAG CATTCAATTA 
2301 TCATATGGGA 6AGAAGCCAT 
2351 GCTAGCCTGG TTCCTTTTAT 
2H01 CACTATGAAT GCAAGCAATG 
55 2i»Sl TCGATATCAT GAAAGGACTC 
2501 AGTGTGGGAA AGCCTTCATT 
2551 ACTCACACGG GAGAGAAACC 
2b01 CATGAAAGGA CTTACACTGG 



TTCCTCACCT TCCTCGCTGC GCGGGCGGCG 
GCCCGAGAGG GACCTGGTGC CTGTACCCAG 
TGCGCTATGC CCTGCTGTAG TCACAGGAGC 
ATCTGAAAGC CGGGAAATGG ACCCAGTGGC 
ACTTCACCCA GGAAGAGTGG ACATTGCTGG 
TTCAGGGAAG TGATGCTGGA AACTTTCAGG 
AAAATGGAGT GACCAGAACA TTGAATATGA 
GCTTCAGGAG TCTCATAGAA GAGAAAGTCA 
CATTGTGGAG AAACTTTTAC CCAGGTTCCA 
GGAGAAGAAA GCTTCTCCTG AAGTAAAATC 
CAGAAGTTGG CATAGGTAAC TCATCTTTTA 
ACTGGACACA AGGCATATGA GTATCAGGAA 
GTGTCAACAA CCTAAAAATA AGAAAGCCTT 
GAACACAAGA AAGGGATCAC ACTGGAGAGA 
TGTGGAAAAA CCTTTATTTT CCATTCAAGC 
GCACAGTGGG GATGGAACTT ATAAATGTAA 
ATTCTTTCAG TTTATATCTT ATCCATGAAA 
CCATATGAAT GTAAACAATG TGGTAAATCC 
TCAAATACAT GAAAGAACTC ACACTGGGGA 
AATGTGATAA AGCATTTCAT AGTTCTAGTT 
AGTCACATGG GAGAGAAGCC TTATCAATGC 
TGCATATACC AGTTCTCTTC GTAGACATGA 
AACCGTATGA ATGTAAGCAA TATGGGGAAG 
TTTCAAACAC ACATAAGAAT GAACTCTGGA 
GATATGTGGG AAAGGCTTTT ATTCTGCCAA 
AAACTCACAC TGGAGAGAAA CGCTATAAAT 
TTCAATCTTT CCAGTTCCTT TCGATATCAT 
GAAACCCTAT GAGTGTAAGC AGTGTGGGAA 
AGCTTCGAGT GCACGGTGGG ACTCACACTG 
AAGGAATGTG GGAAAGCCTT CAGATCTACC 
TAGGACTCAT ACTGGAGAGA AACCCTATGA 
CCTTCAGATA TGTGAAGCAC CTTCAAATTC 
ATAAGAATGC CCTCTGGAGA AAGACCTTAT 
AGGCTTTTAT TCTGCCAAGT CATTTCAAAC 
GAGAGAAACC CTATGAATGC AACCAATGTG 
AATTCCCTTC GATATCATGA AAGGACTCAC 
GTGTAAGCAA TGTGGGAAAG CCTTCAGATC 
ATGAAAGGAC TCACACTGGA GAGAAACCCT 
AAAGCCTTCA GTTGTGCCTC AAACCTTCGA 
TGGAGAGAAA CCCTATGAGT GTAAGCAATG 
CCTCAAACCT TCAGATGCAT GAAAGGACTC 
GAATGTA AGG AATGCGAAAA AGCATTCTGT 
ACATGAAAGG AAGCACAGAG GAGAGAAGCC 
GGAATGGATT CACATCTGCC AAGATTCTTC 
ATTGGAGAGA AACACTATGA ATGTAAGGAA 
TTTTTCTTCC TTGCATATAC ACGCAAGGAC 
ATGAATGTAA GGATTGTGGG AAAGCATTCA 
GGACATGAAT AGACTCACAC TGGAAGGAAG 
TGGCAAAACT TTCACATTTT CCAGTTCTTT 
ACACTGGGGA GAAACCCTAT CAATGTAAGC 
CCTTTTACTT CTTTTCAATG TCATGAAAGG 
CTATGAGTGT ATTCTAGTTC CGTTTGATAT 
AGTGAAACCC TATGAATGTA AGCAATGTGG 
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2bSl GAAAGCCTTC AGATGTGCCT CGCACCTTCA ACGGCATGGA AGGGTTCACA 

2701 CTTGGGAGAA ACTCTATGAA TGTAAGCAGT ATGGGAAAGC CTTCAGATCT 

2751 GCCAAGATTC TTTGAATACA GATAATTAAT GTAAACAATT ATCATAAGTA 

SaDl TACTAACATG TTATTCTTTT TAAATAAGAA GGTATAATAA AATATCCCAT 

EfiSl TGGTTTTATG TATTAAAAAA AAAAAAAAAA AAAA 



BLAST Results 

No BLAST result 

Medline entries 



Katoh Oi Oguri T-i Takahashi Ti Takai Sn Fujiwara Y-. lilatanabe H-i 
ZK1-, a 

20 novel Kruppel-type zinc finger genei is induced following 

exposure to ionizing radiation and enhances apoptotic cell death 
on hematopoietic cells* Biochem Biophys Res Commun 1115 Aug 

25^241(3) ^s-boo 

25 15137313: 

Uick I1J-. Ann DKi Lee NM -i Loh HH-i Isolation of a cDNA encoding a 
novel 

zinc-finger protein from 

neuroblastoma x glioma NGlDfi-15 cells. Gene 1115 Jan 
30 23;152(2) :227-32 



35 Peptide information for frame 1 



ORF from 127 bp to 2352 bpi peptide length: 742 

Category: similarity to known protein 
40 Classification: Nucleic acid management 

Prosite motifs: RGB <14b-14fi) 

ATP_GTP_A (1SS-SDH) 

ZINC_FINGER_C2H2 (11b-21b) 

ZINC_FINGER_C2H2 (224-244) 
45 ZINC_FINGER_C2H2 (252-272) 

ZINC_FINGER_C2H2 (2fiO-30D) 

ZINC_FINGER_C2H2 (308-325) 

ZINC_FINGER_C2H2 (3b4-354) 

ZINC_FINGER_C2H2 (312-412) 
50 ZINC_FINGER_C2H2 (420-440) 

ZINC_FINGER_C2H2 (445-4b6) 

ZINC_FINGER_C2H2 (510-530) 

ZINC_FINGER_C2H2 (535-555) 

ZINC_FINGER_C2H2 (5bb-5flb) 
55 ZINC_FINGER_C2H2 (514-bl4) 

ZINC_FINGER_C2H2 (b22-b42) 

ZINC_FINGER_C2H2 (bS0-b70) 

ZINC_FINGER_C2H2 (b7fl-b1B> 
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ZINC_FINGER_C2H2 (?0b-?2b) 
ZINC_FINGER__C2H2 (HTb-M^fl) 



5 1 MPCCSHRSCR EDPGTSESRE MDPVAFEDVA VNFTflEEUTL LDISflKNLFR 

51 EVMLETFRNL TSIGKKUSDiS NIEYEYdNPR RSFRSLIEEK VNEIKEDSHC 

101 GETFTfiVPDD RLNFflEKKAS PEVKSCDSFV CAEV6IGNSS FNMSIRGDTG 

151 HKAYEYfiEYG PKPYKCfiflPK NKKAFRYRPS IRTflERDHTG EKPYACKVCG 

501 KTFIFHSSIR RHPIVI1HSGDG TYKCKFCGKA FHSFSLYLIH ERTHTGEKPY 

10 251 ECKflCGKSFT YSATLCIHER THTGEKPYEC SKCDKAFHSS SSYHRHERSH 

301 MGEKPYQCKE CGKAFAYTSS LRRHERTHSG KKPYECKflYG EGLSYLISFfl 

351 THIRMNSGER PYKCKICGKG FYSAKSFdTH EKTHTGEKRY KCKflCGKAFN 

401 LSSSFRYHER IHTGEKPYEC KdCGKAFRSA SflLRVHGGTH TGEKPYECKE 

HS1 CGKAFRSTSH LRVHGRTHTG EKPYECKECG KAFRYVKHLfl IHERTEKHIR 

15 501 MPSGERPYKC SICEKGFYSA KSFflTHEKTH TGEKPYECNfl CGKAFRCCNS 

551 LRYHERTHTG EKPYECKflCG KAFRSASHLR MHERTHTGEK PYECKflCGKA 

bOl FSCASNLRKH GRTHTGEKPY ECKflCGKAFR SASNLflflHER THTGEKPYEC 

b51 KECEKAFCKF SSFUIHERKH RGEKPYECKH CGNGFTSAKI LtJIHARTHIG 

701 EKHYECKECG KAFNYFSSLH IHARTHMGEK PYECKDCGKA FS 

20 



BLASTP hits 

25 No BLASTP hits available 

Alert BLASTP hits for DKFZphtes3_10ilbi frame 1 
No Alert BLASTP hits found 

30 



Peptide information for frame 2 



35 ORF from 1703 bp to 258M bp; peptide length: 21M 
Category: questionable ORF 
Classification: no clue 

1 HKKLTLERNP MNATNVVKPS DVAIPFDIMK GLTLERNPMS VSNVGKPSDL 

40 51 PHTFECMKGL TLERNPMSVS NVGKPSVVPfl TFESMVGLTL ERNPMSVSNV 

101 GKPSDLPflTF RCMKGLTLER NPMNVRNAKK HSVNSLLFKY HKGSTEERSP 

151 MNVSIVGMDS HLPRFFKYPIfl EHTLERNTflN VRNAEKHSII FLPCIYTfiGL 

201 IhlERSHMNVR IVGKHSASLV PFMDMNRLTL EGSTHNASNV AKLSHFPVLF 

251 DIMKGLTLGR NPINVSSVG< PSFLLLLFNV NK6LTRERNP MSVF 

45 



BLASTP hits 

50 No BLASTP hits available 

Alert BLASTP hits for DKFZphtes3_lQilb-> frame 2 

TREMBL:AF153201_1 product: "zinc finger protein dp"; Homo 
55 sapiens zinc 

finger protein dp mRNA-i complete cds-i N = 1, Score = 225, p a 
M.le-lfl 
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>TREHBL : AF153EQ1_1 product: "zinc finger protein dp n \ Homo 
sapiens zinc 

finger protein dp mRNA-i complete cds- 
5 Length = 4E3 

HSPs: 

Score = E25 (33-fl bits)-. Expect = 4.1e-lfl-i P = M-le-lfl 
10 Identities = S4/E4b (34'/.), Positives = 1EE/E4b (4<tt) 

Uuery: lb VVKPSDVA- 

IPFDHIKGLTLERNPilSVSNVGKPSDLPHTFECMKGLTLERNPMSVSNVGK 74 

V KPS AIFI++LRN+V+VKS T ++G 
15 TLERNP++V +VGK 

Sbjct: 3 VGKPSVRAfllLFCIRESI- 

LGRNHIHVISVAKVSVRIflTLLNIEGSTLERNPINVMSVGK tl 

fluery: 75 

20 PSVVPflTFESnVGLTLERNPriSVSNVGKPSDLPflTFRCUKGLTLERNPMNVRNAKKHSVN 134 

+ 12+ + G LERNP+ V NV KPS d + TLER+ +V 

+A K V 
Sbjct: bS 

LLIRA(2SLFYIRGFILERNPIPVINVAKPSVGFflILLIINEFTLERSLTHVISAIKCLVE 1E1 

25 

fluery: 13S SLLFKYMKGSTEERSPIINVSIVGriDS- 
HLPRFFKYflflEHTLERNTHNVRNAEKHSIIFLP 

+ + + R+PMNV VG P F +++E TLERN H+V 

K + 

30 Sbjct: 1EE DEILLNITEFIflVRNPHNVMNVGKPLVRAPTLF- 
FIRESTLERNLNHVVIVLKALVAVtSI IflD 

Query: 11N 

CIYTlSGLIlilERSHIINVRIVGKHSASLVPFUDMNRLTLEGSTnNASNVAKLSHFPVLFDHI ES3 
35 + + ER+HI1+V V K +++ TL S + A V K S 

+ + 

Sbjct: Ifll 

LLSIKEYTLERNHIIHVISVIKVLVKAflTSLNIREYTLVKSLIIAIVVRKPSVRVLTLFFI E4D 

40 fluery: E54 KGLTLGRN Ebl 

+ TL +N 
Sbjct: EH1 REFTLEKN E4fl 

Score = E15 (32-3 bits)i Expect = Lle-lfan P = l-le-lb 
45 Identities = BE/E4b (33*)-, Positives = 1E4/E4b C5DS) 

(2uery: 44 

VGKPSDLPHTFECnKGLTLERNPMSVSNVGKPSVVPflTFESMVGLTLERNPMSVSNVGKP 1D3 
VGKPS C++ L RN + V +V K SV AT ++' G 

50 TLERNP++V +VGK 
Sbjct: 3 

VGKPS VRAfllLFCIRESILGRNHIHVISVAKVSVRIfiTLLNIEGSTLERNPIN VMS VGKL bE 

(3uery: 104 SDLPflTFRCMKGLTLERNPMNVRNAKKHSVNSLLFKYMKGSTEERSPMNV- 
55 SIVGI1— » 151 

(2+ ++G LERNP+ V N K SV + + T ERS +V S 

+ D 
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Sbjct: b3 

LIRAflSLFYIRGFILERNPIPVINVAKPSVGFfllLLIINEFTLERSLTHVISAIKCLVED 122 

fluery: IbO SHLPRFFKYMflEHTLERNTMNVRNAEKHSIIFLPCIY- 
TflGLIIilERSHMNVRIVGKHSAS 21fl 

L +++fl RN MNV N K ++ P ++ + ER+ 11+V 

IV K + 

Sbjct: 123 EILLNITEFIflV RNPMNVMNVGK- 

PLVRAPTLFFIRESTLERNLUHVVIVLKALVA 177 



fluery: S11 LVPFMDIINRLTLEGSTMNASNVAK- 
LSHFPVLFDIMKGLTLGRNPINVSSVGKPSFLLLL 277 

+ + + TLE + 11+ +V K L +1 + TL ++ I V 

KPS +L 

15 Sbjct: 17fi VfllLLSIKEYTLERNHMHVISVIKVLVKAflTSLNIRE- 
YTLVKSLIIAIVVRKPSVRVLT 23b 

fluery: 27fl FNVMKGLTRERN 26=) 

++ T E+N 
20 Sbjct: 237 LFFIREFTLEKN 2Mfl 

Score = 2D7 (31.1 bits)-. Expect = S.2e-lSi P = S.2e-lS 
Identities = fiD/270 (2150 -, Positives = 12T/270 (172) 

25 fluery: 1 tlKKLTLERNPIINATNVVKPSDVAIPFDI- 

mkgltlernpmsvsnvgkpsdlphtfecmkg ST 

+++ L RN ++ +V K S V I + ++G TLERNP++V +VGK 

+ ++G 

Sbjct: lb IRESILGRNHIHVISVAKVS- 
30 VRIflTLLNIEGSTLERNPINVPISVGKLLIRAflSLFYIRG 7M 

fluery: bD 

LTLERNPMSVSNVGKPSVVPflTFESMVGLTLERNPIlSVSNVGKPSDLPflTFRCflKGLTLE 111 
LERNP+ V NV KPSV fl + TLER+ V + K + 

35 + 

Sbjct: 75 

FILERNPIPVINVAKPSVGFfllLLIINEFTLERSLTHVISAIKCLVEDEILLNITEFIflV 13H 
fluery: 12D 

40 RNPt1NVRNAKKHSVNSLLFKYI1KGSTEERSPI1NVSIV6(1DSHLPRFFKYI1flEHTLERNTM 17T 

RNPMNV N K V + +++ ST ER+ M+V IV + 

++E+TLERN II 
Sbjct: 135 

RNPHNVMNVGKPLVRAPTLFFIRESTLERNLMHVVIVLKALVAVfllLLSIKEYTLERNHM im 

45 

fluery: 160 

NVRNAEKHSIIFLPCIYTflGLIblERSHriNVRIVGKHSASLVPFIIDriNRLTLEGSTIINASN 23T 
+V + K + + + +S + +V K S ++ + TLE 

+ + 
50 Sbjct: 1=15 

HVISVIKVLVKAflTSLNIREYTLVKSLIIAIVVRKPSVRVLTLFFIREFTLEKNYYLCTfl 25H 

fluery: 2M0 VAKLSHFPVLFDIMKGLTL — GRNPINVSSVGK 270 
+K F + J>++K + G P S K 
55 Sbjct: 255 CSK — SFSfllSDLIKHflRIHTGEKPYKCSECRK 2A5 

Score = Ifll (27-2 bits)-. Expect = l-He-ll-. P = 1-Me-ll 
Identities = 71/2b1 (272)-. Positives = llb/2bT (43*) 
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fluery: S 

TLERNPNNATNVVKPSDVAIPFDIMKGLTLERNPNSVSNVGKPSDLPHTFECI1KGLTLER b4 
TLERNP+N +V K A ++G LERNP+ V NV KPS 

5 + TLER 

Sbjct: 4fl 

TLERNPINVMSVGKLLIRA<3SLFYIRGFILERNPIPVINVAKPSVGF(2ILLIINEFTLER 1D7 
Query: b5 

10 NPMSVSNVGKPSVVPaTFESMVGLTLERNPMSVSNVGKPSDLPaTFRCIIKGLTLERNPriN 1E4 

+ V + K V + ++ RNPH+V NVGKP T ++ 

TLERN M+ 
Sbjct: IDS 

SLTHVISAIKCLVEDEILLNITEFIQVRNPPINVIINVGKPLVRAPTLFFIRESTLERNLriH lb? 

15 

(Juery: 15S VRNAKKHSVNSLLFKYMKGSTEERSPflNV- 
SIVGMDSHLPRFFKYMQEHTLERNTMNVRN 183 

V K V + +K T ER+ H+V S++ + ++E+TL 

++ + 

20 Sbjct: Ibfl V VI VLKALV AVAIL LSIKEYTLERNHflH VIS VIKVLVKAfiTSLN - 
IREYTLVKSLIIAIV EEL 

Query: lfl4 

AEKHSIIFLPCIYTtJGLIUERSHUNVRIVGKHSASLVPFMDriNRLTLEGSTriNASNVAKL S43 
25 K S+ L + + E+++ K + + + R+ 

S K 

Sbjct: EE? 

VRKPSVRVLTLFFIREFTLEKNYYLCTQCSKSFSQISDLIKHQRIHTGEKPYKCSECRKA Efib 

30 Query: S44 SHFPVLFDINKGLTLGRNPINVSSVGKPSF E73 

L + + + G+ P GK SF 

Sbjct: Hfl7 FSQCSLLALHQRIHTGKKPNPCDECGK-SF 315 

Score = Ibb (E4-=1 bits)n Expect = fl-Me-lD-i P = fi-Me-lD 
35 Identities = b3/l=J4 C3E*)-, Positives = 8=1/114 (452) 

Query: 100 

VGKPSDLPflTFRCnKGLTLERNPMNVRNAKKHSVNSLLFKYUKGSTEERSPnNVSIVGriD 15=1 
VGKPS Q C++ L RN ++V + K SV ++GST 

40 ER+P+NV VG 
Sbjct: 3 

VGKPSVRAQILFCIRESILGRNHIHVISVAKVSVRIQTLLNIEGSTLERNPINVPISVGKL bE 
Query: ibD 

45 SHLPRFFKYMQEHTLERNTMNVRNAEKHSIIFLPCIYTQGLIUERSHIINVRIVGKHSASL 51T 

+ Y++ LERN + V N K S+ F + ERS +V 

K 

Sbjct: b3 

LIRAQSLFYIRGFILERNPIPVINVAKPSVGFQILLIINEFTLERSLTHVISAIKCLVED 1EE 

50 

Query: BED VPFIIDHNRLTLEGSTtlNASNVAK- 
LSHFPVLFDIMKGLTLGRNPINVSSVGKPSFLLLLF E?fl 

+++ + PIN NV K L P LF I + TL RN ++V V K 

+ + 

55 Sbjct: 1E3 EILLNITEFIQVRNPMNVMNVGKPLVRAPTLFFIRES- 
TLERNLUHVVIVLKALV AVAIL Ifll 

(Juery: E7=i NVMKGLTRERNPMSV £=13 
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+K T ERN 11 V 
Sbjct: 1S2 LSIKEYTLERNHI1HV lib 

Pedant information for DKFZphtes3_10ilbi frame 1 

Report for DKFZphtes3_10ilb.l 



ELENGTH1 734 



Epll T-24 

EH0I10L1 TREI1BL : ABDlltlt.l gene: n ZKl"i product: "Kruppel- 

15 type zinc finger protein"i Homo sapiens ZK1 mRNA for Kruppel-type 
zinc finger proteini complete cds. 0.0 

EFUNCATl 30. ID nuclear organization ES. cerevisiaei YJLOSbcl 
be-33 

EFUNCATl D4 .05.D1.04 transcriptional control ES- cerevisiaei 

20 YJLDSbcl be-33 

EFUNCATl 04. n other transcription activities ES- cerevisiaei 
Y0R113wl Se-24 

EFUNCATJ 04.01.01 rrna synthesis ES- cerevisiaei YPRiabc PZF1 - 
TFIIIAl le-20 

25 EFUNCATl D4-03.01 trna synthesis ES- cerevisiaei YPRlflbc PZF1 - 
TFIIIAl le-20 

EFUNCATl 13.01 homeostasis of other ions ES. cerevisiaei 
YNLQ27wl le-13 

EFUNCATl 11-07 detoxif icaton ES- cerevisiaei YGL2S4wl 2e-12 

30 EFUNCATl 01.02.01 regulation of nitrogen and sulphur utilization 
ES- cerevisiaei YGL2S4wl 2e-12 
EFUNCATl 01-05.04 regulation of carbohydrate utilization ES. 
cerevisiaei YGL201wl 2e-ll 

EFUNCATl D4-05.T : i other mrna-transcription activities ES. 
35 cerevisiaei YERD2flcl 3e-10 

EFUNCATl 11.01 stress response ES. cerevisiaei YKLDb2wl le-01 
EFUNCATl 01-01.04 regulation of amino-acid metabolism ES- 
cerevisiaei YDR2S3cl 5e-0T 

EFUNCATl n unclassified proteins ES- cerevisiaei YBRObbcl 

40 3e-0A 

EFUNCATl 03-07 pheromone responsei mating-type determinationi 
sex-specific proteins ES. cerevisiaei YDR14bcl le-D7 
EFUNCATl 03-25 cytokinesis ES. cerevisiaei YLR131cl 2e-0b 
EBL0CKS1 BL004bb TFIIS zinc ribbon domain proteins 
45 EBLOCKS! BL002MSA Phytochrome chromophore attachment site 
proteins 

EBL0CKS1 DNOnSlB 

EBL0CKS1 PF013b3B 

EBL0CKS1 BL01030 
50 EBL0CKS1 PF000UB 

EBL0CKS1 BL00D2fl Zinc fingeri C2H2 typei domain proteins 

EBL0CKS1 BP04213E 

EBL0CKS1 BPD4213C 

EBL0CKS1 BPD4213B 
55 ESC0P1 d2adr 7. 31. 1- 1-4 ADR1 Esynthetic based on yeast 

(Saccharomyce 2e-05 

EPIRKU1 nucleus le-S3 

EPIRKU! RNA binding 2e-5fi 
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EPIRKUl duplication le-34 

EPIRKUl tandem repeat le-171 

EPIRKUl spermatogenesis 5e-L,E 

EPIRKUl zinc le-lbl 

5 EPIRKUl zinc finger 

EPIRKUl DNA binding 

EPIRKUl metal binding le-lEO 

EPIRKUl phosphoprotein Ee-Sfl 

EPIRKUl leucine zipper le-53 

10 EPIRKUl alternative splicing Ee-Sfl 

EPIRKUl eye lens le-111 

EPIRKUl oocyte . le-lOb 

EPIRKUl transcription factor le-111 

EPIRKUl embryo le-lOb 

15 EPIRKUl segmentation le-34 

EPIRKUl transcription regulation le-lSE 

ESUPFAM1 POZ domain homology 7e-A3 

ESUPFAM3 transcription factor Krueppel le-34 

ESUPFAM1 zinc finger protein ZFP-3b le-173 

20 ESUPFAH1 transcription factor IIIA fle-31 

EPR0SITE1 ATP_GTP_A 1 

EPR0SITE1 RGD 1 

EPR0SITE1 ZINC_FINGER_C2HE Ifl 
EPFAM3 Zinc finger-. CEHE type 

25 EPFAM3 TNFR/NGFR cysteine-rich region 

EKU1 Irregular 

EKU1 3D 

EKU1 LOUCOMPLEXITY 3-57 V. 



30 



SEfl RKURGSLSSPSSLRGRRLVTGtJTSPRGTUCLYPGFCRSVACAflPCCSHRSCREDPGTSES 

SEG • • • xxxxxxxxxxxxxxx 

ImeyF 



35 

SEfl REMDPVAFEDVAVNFTflEEUTLLDISfiKNLFREVMLETFRNLTSIGKKUSDflNIEYEYflN 

SEG 

ImeyF 



40 

SE<2 PRRSFRSLIEEKVNEIKEDSHCGETFTflVPDDRLNFflEKKASPEVKSCDSFVCAEVGIGN 

SEG 

ImeyF 



45 

SEA SSFNHSIRG]>TGHKAYEY(2EYGPKPYKC(2(3PKNKKAFRYRPSIRT(JER]>HTGEKPYACKV 

SEG 

ImeyF 



50 

SEA CGKTFIFHSSIRRHMVHHSGDGTYKCKFCGKAFHSFSLYLIHERTHTGEKPYECK(3CGKS 

SEG 

ImeyF 



55 

SEd FTYSATLfllHERTHTGEKPYECSKCDKAFHSSSSYHRHERSHflGEKPYflCKECGKAFAYT 
SEG xxxxxxxxxxxxx 
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SE(2 SSLRRHERTHSGKKPYECKfiYGEGLSYLISFQTHIRllNSGERPYKCKICGKGFYSAKSFfl 

5 SEG > 

ImeyF 

SE(2 THEKTHTGEKRYKCK<2CGKAFNLSSSFRYHERIHTGEKPYECK<2CGKAFRSASflLRVHGG 

10 SEG 

ImeyF 

SE<2 THTGEKPYECKECGKAFRSTSHLRVHGRTHTGEKPYECKECGKAFRYVKHLlJIHERTEKH 

15 SEG 

ImeyF 

SEC IRMPSGERPYKCSICEKGFYSAKSFlJTHEKTHTGEKPYECNflCGKAFRCCNSLRYHERTH 

20 SEG 

ImeyF 

SECJ TGEKPYECKflCGKAFRSASHLRriHERTHTGEKPYECKflCGKAFSCASNLRKHGRTHTGEK 

25 SEG 

ImeyF 

• • TTTEETTTTTCEETTHHHHHHHHHHHHTTCCEEETTTTEEECCHHHHHHHHHHHHCCC 

SEA PYECKflCGKAFRSASNLflflHERTHTGEKPYECKECEKAFCKFSSFtJIHERKHRGEKPYEC 

30 SEG 

ImeyF 

CEEETTTTEEECCHHHHHHHHHHH 

SEfi KHCGNGFTSAKILfllHARTHIGEKHYECKECGKAFNYFSSLHIHARTHMGEKPYECKDCG 

35 SEG 

ImeyF 

SEfl KAFS 

40 SEG 

ImeyF .... 



45 



Prosite for DKFZphtes3_lDilb.l 



50 



55 



PSOOOlb 


Iflfl- 


>ni 


RGD 






PDOCOODlb 


PS0D017 


337- 


>SM5 


ATP_GTP_A 




PD0CDDD17 


PSDDDSfl 


33fi- 


>S5=1 


ZINC 


FINGER 


CSHS 


PDOCODOSfl 


PSDDDSfl 


Sbb- 


>3fl7 


ZINC 


FINGER. 


_CSHS 


PDOCDDDSfl 


PSDDDSfl 


S^M- 


>31S 


ZINC. 


.FINGER. 


_CSHS 


PDOCDODSfl 


PSDDDSfl 


3SS- 


>3M3 


ZINC. 


FINGER 


.CSHS 


PDOCDODSfl 


PSDDDSfl 


3SD- 


>371 


ZINC. 


.FINGER. 


_CSHS 


PDOCDODSfl 


PSDDD2B 




>M37 


ZINC 


FINGER 


.CSHS 


PDOCDDDSfl 


PSDDDSfl 


M3i4- 


>M55 


ZINC. 


.FINGER. 


.CSHS 


PDOCDODSfl 


PSDDDSfl 


MbS- 


>Mfl3 


ZINC. 


FINGER 


.CSHS 


PDOCDDDSfl 


PSDDDSfl 




>511 


ZINC 


.FINGER. 


.CSHS 


PDOCODDSfl 


PSDDDSfl 


553- 


>S73 


ZINC. 


.FINGER. 


.CSHS 


PDOCDDDSfl 
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PSDQDEfi 


5BD- 


>b01 


ZINC. 


.FINGER. 


_CSHS 


PDOCOOOEB 


psoaoaa 


boa- 


^bET 


ZINC_FINGER_ 


_CEHS 


PDOCODOEfl 


PSDODSfl 


b3b- 


>b57 


ZINC. 


FINGER 


_CEHE 


PDOCODOEfl 


n n n n ~i n 

PSDDDEB 


bb4- 


>bas 


ZINC. 


FINGER. 


C2HE 


PDOCDDOEB 


PSOODEfl 




>713 


ZINC. 


FINGER. 


.CEHE 


PDOCDODSa 


PSDDDEB 


7E0- 


>?m 


ZINC. 


.FINGER. 


CEHE 


PDOCDDOEa 


PSODOEB 


748- 


>7b c l 


ZINC. 


FINGER. 


CEHE 


PDOCDDOEB 


PSDDDSfl 


sia- 


>sm 


ZINC. 


FINGER. 


.CEHE 


PDOCDDDEB 



10 

Pfaro for DKFZphtes3_10ilb-l 



15 Hi1H_NAME TNFR/NGFR cysteine-rich region 



20 



WIN 

fluery 
b7 



*CpeGtYtD-bJNHvpqClpC- - trCePENGflYNvqPCTwTflNTVC* 
C + +++ +++++C C ++C+++ G++++++ ++ V 
3D CLYPGFCRSVACANPC — CSHRSCREDPGTSESRENDP VA 



HNN_NANE Zinc finger-i CEHE type 

25 

Hlltl *CpwPDCgKtFrrwsNLrRHNRTH* 

C++ CGKTF S+ RRHN +H 
fluery E3B CKV — CGKTFIFHSSIRRHflVflH ESS 

30 3E-1S (bits) f: Ebb t: SBb Target: dkf zphtes3_lDilb - 1 
similarity to ZK1 (Homo sapiens) i complete cds- 

Alignment to HUN consensus: 
Query *CpwPDCgKtFrrwsNLrRHF1RTH* 

C++ CGK+F + S + +H RTH 
35 dkfzphtes3 Ebb CKF — CGKAFHSFSLYLIHERTH Eflb 

fluery f: £"=14 t: 314 Target: dkf zphtes3_10ilb • 1 

similarity to ZK1 (Homo sapiens) i complete cds- 
Alignment to HUM consensus: 
40 HMN *CpwPDCgKtFrrwsNLrRHNRTH* 

C+ CGK+F+++ +L++H RTH 
fluery £^4 CKfl — CGKSFTYSATLfllHERTH 314 

34. EE (bits) f: 3EE t: 34E Target: dkf zphtes3_10ilb-l 
45 similarity to ZK1 (Homo sapiens)i complete cds- 
Alignment to HUN consensus: 
fluery *CpwPDCgKtFrrwsNI_rRHNRTH* 

C++ C+K+F ++S++ RH R+H 
dkfzphtes3 3EE CSK — CDKAFHSSSSYHRHERSH 34E 

50 

fluery f: 3S0 t: 37D Target: dkf zphtes3_10ilb - 1 

similarity to ZK1 (Homo sapiens)i complete cds- 

Alignment to HNN consensus: 
HNN *CpwPDCgKtFrrwsNLrRHNRTH* 
55 C++ CGK+F + S+LRRH RTH 

fluery 35D CKE--CGKAFAYTSSLRRHERTH 37D 
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32-01 (bits) f: MOb t: M2b Target: dkf zphtes3_10ill- - 1 
similarity to ZK1 (Homo sapiens)-i complete cds- 

Alignment to HUM consensus: 
(Juery *CpwPDCgKtFrrwsNLrRHMRTH* 
5 C++ CGK F ++ ++++H +TH 

dkf zphtes3 40b CKI — CGKGFYSAKSFlJTHEKTH 42b 

(Juery f: H3M t: 454 Target: dkf zphtes3_10ilb - 1 

similarity to ZK1 (Homo sapiens) ■> complete cds- 
10 Alignment to HUM consensus: 

HMM *CpwPDCgKtFrrwsNLrRHMRTH* 

C+ CGK+F+ +S++R H R+H 
(Juery 434 CK(J — CGKAFNLSSSFRYHERIH 454 

15 32-14 (bits) f: 4b2 t: 4fl2 Target: dkf zphtes3_10ilb -1 
similarity to ZK1 (Homo sapiens) ■> complete cds* 

Alignment to HUM consensus: 
(Juery *CpwPDCgKtFrrwsNLrRHMRTH* 

C+ CGK+FR++S+LR H TH 
20 dkfzphtes3 4b2 CKfl — CGKAFRSASflLRVHGGTH 4fl2 

(Juery f: 410 t: S10 Target: dkf zphtes3_10ilb - 1 

similarity to ZK1 (Homo sapiens) t complete cds. 
Alignment to HMM consensus: 
25 HMM " *CpwPDCgKtFrrwsNLrRHMRTH* 

C++ CGK+FR+ S+LR H RTH 
(Juery 410 CKE--CGKAFRSTSHLRVHGRTH SID 

30- bl (bits) f: 515 t: S4D Target: dkf zphtes3_10ilb - 1 
30 similarity to ZK1 (Homo sapiens) ■> complete cds- 

Alignment to HMM consensus: 
(Juery *CpwPDCgKtFrrwsNLrRHMR • -T-H* 

C++ CGK+FR+ +L++H R H 
dkfzphtes3 516 CKE — CGKAFRYVKHLcJIHERTE-KH SID 

35 

(Juery f: 552 t: 572 Target: dkf zphtes3_lDilb-l 

similarity to ZK1 (Homo sapiens)-! complete cds- 

Alignment to HMM consensus: 
HMM *CpwPDCgKtFrrwsNLrRHMRTH* 
40 C++ C+K F ++ ++++H +TH 

(Juery 552 CSI — CEKGFYSAKSFcJTHEKTH 572 

31- 33 (bits) f: 580 t: bDD Target: dkf zphtes3_lDilb - 1 
similarity to ZK1 (Homo sapiens) i complete cds- 

45 Alignment to HMM consensus: 

(Juery *CpwPDCgKtFrrwsNLrRHMRTH* 

C+ CGK+FR +LR H RTH 
dkfzphtes3 SAD CN(3 — CGKAFRCCNSLRYHERTH bDO 

50 (Juery f: bOfl t: b2fl Target: dkf zphtes3_lDilb-l 

similarity to ZK1 (Homo sapiens) i complete cds- 

Alignment to HMM consensus: 
HMM *CpwPDCgKtFrrwsNLrRHMRTH* 

C+ CGK+FR++S+LR+H RTH 
55 (Juery bOfi CKtJ— CGKAFRSASHLRMHERTH b2fl 

35-30 (bits) f: b3b t: b5b Target: dkf zphtes3_lDilb - 1 
similarity to ZK1 (Homo sapiens) -i complete cds- 
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Alignment to HUM consensus: 
fluery *CpwPDCgKtFrrwsNLrRHMRTH* 

C+ CGK+F+ +SNLR+H RTH 
dkfzphtesB b3b CKU--CGKAFSCASNLRKHGRTH b5b 

5 

Query f: bb4 t: bfl4 Target: dkf zphtes3_10ilb . 1 

similarity to ZK1 (Homo sapiens)i complete cds- 

Alignment to HMM consensus: 
HtlM *CpwPDCgKtFrrwsNLrRHMRTH* 
10 C+ CGK+FR++SNL++H RTH 

fluery bbM CK<2--CGKAFRSASNLflMHERTH bflM 

31.74 (bits) f: b^2 t: 71S Target: dkf zphtes3_10ilb . 1 
similarity to ZK1 (Homo sapiens) i complete cds- 
15 Alignment to HtlM consensus: 

fluery *CpwPDCgKtFrrwsNLrRHMRTH* 

C++ C+K+F+ S++++H R H 
dkfzphtesS b^ CKE — CEKAFCKFSSFfllHERKH 71S 

20 fluery f: 7ED t: 7MD Target: dkf zphtes3_lDilb . 1 

similarity to ZK1 (Homo sapiens) i complete cds- 

Alignment to HUM consensus: 
HMM *CpwPDCgKtFrrwsNLrRHMRTH* 

C++ CG F+++ L++H RTH 
25 fluery 720 CKH — CGNGFTSAKILfllHARTH 74D 

34-flfl (bits) f: 7MB t: 7bfl Target: dkf zphtes3_lDilb . 1 
similarity to ZK1 (Homo sapiens) i complete cds- 
Alignment to HMM consensus: 
30 fluery *CpwPDCgKtFrrwsNLrRHMRTH* 

C++ CGK+F++ S+L +H RTH 
dkfzphtes3 7Mfl CKE — CGKAFNYFSSLHIHARTH 7bfl 



35 



40 



50 



Pedant . information for DKFZphtes3_lQilbi frame E 
Report for DKFZphtes3_10ilb.5 



ILENGTH3 

CMU3 SSDfiS.IA 
EpII 1.17 

45 CHOMOLJ TREMBL : AF1S3ED1_1 product: "zinc finger protein 

dp"i Homo sapiens zinc finger protein dp mRNAi complete cds. 7e- 
17 

EKIdH All_Alpha 



SEfl MKKLTLERNPMNATNVVKPSDVAIPFDIMKGLTLERNPMSVSNVGKPSDLPHTFECMKGL 
PRD cccccccccccceeeeecccccchhhhhccccccccccccccccccccccccchhhhhee 



SEfl TLERNPMSVSNVGKPSVVPflTFESMVGLTLERNPMSVSNVGKPSDLPflTFRCMKGLTLER 

55 PRD ecccccccccccccccchhhhhhhhhhhhhccccccccccccccccchhhhhhhhhhhcc 

SEfl NPMNVRNAKKHSVNSLLFKYMKGSTEERSPMNVSIVGMDSHLPRFFKYMflEHTLERNTMN 

PRD cccccccccccccccccccccccccccccccceeeeecccchhhhhhhhhhhhhhhcccc 
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. SE<2 VRNAEKHSIIFLPCIYTflGLIUERSHriNVRIVGKHSASLVPFriDIINRLTLEGSTnNASNV 
PRD chhhhhhheeeccceeechhhhhcccceeeeeccccceeeeccchhhhhhhccccccccc 

5 SEC AKLSHFPVLFDIHKGLTLGRNPINVSSVGKPSFLLLLFNVrilCGLTRERNPIlSVF 
PRD cccccccchhhhhhhhcccccccccccccccchhhhhhhhhccccccccccccc 

(No Prosite data available for DKFZphtes3_lQilb.E) 

10 

(No Pfam data available for DKFZphtes3_lDilt>.2) 
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5 group: testis derived 

DKFZphtes3_10nl0 encodes a novel 5D2 amino acid protein without 
similarity to known proteins. 

10 The mRNA is differentially polyadenylated and the novel protein 
is ubiquitously expressed. 

No informative BLAST resultsi No predictive prosite-i pfam or SCOP 
motif e- 

15 The new protein can find application in studying the expression 
profile of testis-specif ic genes- 



unknown protein 

20 

differentially polyadenylated 
Sequenced by tiiagen 
25 Locus: unknown 

Insert length: SSS1 bp 

Poly A stretch at pos- HS31-1 polyadenylation signal at pos. 2513 

30 

1 CTCACCCTCC CAAGTGGCTG GGACTGCAGG TTCTAAATGG CTTCTAAGAA 

51 GTTGGGTGCA GATTTTCATG GGACTTTCAG TTACCTTGAT GATGTCCCAT 

101 TTAAGACAGG AGACAAATTC AAAACACCAG CTAAAGTTGG TCTACCTATT 

151 GGCTTCTCCT TGCCTGATTG TTTGCAGGTT GTCAGAGAAG TACAGTATGA 

35 201 CTTCTCTTTG GAAAAGAAAA CCATTGAGTG GGCTGAAGAG ATTAAGAAAA 

B51 TCGAAGAAGC CGAGCGGGAA GCAGAGTGCA AAATTGCGGA AGCAGAAGCT 

301 AAAGTGAATT CTAAGAGTGG CCCAGAGGGC GATAGCAAAA TGAGCTTCTC 

351 CAAGACTCAC AGTACAGCCA CAATGCCACC TCCTATTAAC CCCATCCTCG 

M01 CCAGCTTGCA GCACAACAGC ATCCTCACAC CAACTCGGGT CAGCAGTAGT 

40 MSI GCCACGAAAC AGAAAGTTCT CAGCCCACCT CACATAAAGG CGGATTTCAA 

501 TCTTGCTGAC TTTGAGTGTG AAGAAGACCC ATTTGATAAT CTGGAGTTAA 

551 AAACTATTGA TGAGAAGGAA GAGCTGAGAA ATATTCTGGT AGGAACCACT 

bOl GGACCCATTA TGGCTCAGTT ATTGGACAAT AACTTGCCCA GGGGAGGCTC 

LSI TGGGTCTGTG TTACAGGATG AGGAGGTCCT GGCATCCTTG GAACGGGCAA 

45 701 CCCTAGATTT CAAGCCTCTT CATAAACCCA ATGGCTTTAT AACCTTACCA 

751 CAGTTGGGCA ACTGTGAAAA GATGTCACTG TCTTCCAAAG TGTCCCTCCC 

flOl CCCTATACCT GCAGTAAGCA ATATCAAATC CCTGTCTTTC CCCAAACTTG 

651 ACTCTGATGA CAGCAATCAG AAGACAGCCA AGCTGGCGAG CACTTTCCAT 

101 AGCACATCCT GCCTCCGCAA TGGCACGTTC CAGAATTCCC TAAAGCCTTC 

50 151 CACCCAAAGC AGTGCCAGTG AGCTCAATGG GCATCACACT CTTGGGCTTT 

1001 CAGCTTTGAA CTTGGACAGT GGCACAGAGA TGCCAGCCCT GACATCCTCC 

1051 CAGATGCCTT CCCTCTCTGT TTTGTCTGTG TGCACAGAGG AATCATCACC 

1101 TCCAAATACT GGTCCCACGG TCACCCCTCC TAATTTCTCA GTGTCACAAG 

1151 TGCCCAACAT GCCCAGCTGT CCCCAGGCCT ATTCTGAACT GCAGATGCTG 

55 1201 TCCCCCAGCG AGCGGCAGTG TGTGGAGACG GTGGTCAACA TGGGCTACTC 

1251 GTACGAGTGT GTCCTCAGAG CCATGAAGAA GAAAGGAGAG AATATTGAGC 

1301 AGATTCTCGA CTATCTCTTT GCACATGGAC AGCTTTGTGA GAAGGGCTTC 

1351 GACCCTCTTT TAGTGGAAGA GGCTCTGGAA ATGCACCAGT GTTCAGAAGA 
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15 



20 



25 



mOJ AAAGATGATG 
mSl TTGAGCTGAA 
1SD1 GACAATGCTT 
1SS1 GGCCCTGCCT 
IbOl GAGCCCACCT 
lb51 GGGTTAGAAG 
1701 GCCCTGAGCT 
1751 ACTGTCCTGG 
ISDl CTTCCCACTT 
IflSl TATGTCCTCA 
1101 GGGGCGGGf^G 
1151 TTCCCCTGAG 
2001 AGATTCTTCC 
2051 TTAACACTGG 
21D1 TATGGGGCCC 
2151 ACCCCAGCCT 
22D1 CAGGGTTTTA 
2251 CCTTCCCAGC 
2301 CAGCACTAAC 
2351 CTGCTTTAGG 
2M01 TCTGGTTTGT 
2MS1 GGATATACAG 
2501 AGAGAGAACT 
2551 A 



GAGTTTCTTC 
AGACATTAAG 
TGGAAGACCT 
AGGCCCTGCC 
GTGGGGAAAG 
GTCAGGTGTG 
GGGG^GGTGG 
CTCCTTCCGT 
CAGCCCTCCG 
GCTGAAGCCT 
GGCCAGACTC 
ACTGGTTGAC 
AGGGTTTTAT 
TTCTGCAATA 
AGAGTTTGCC 
GTTTCTTTTG 
GAGCCCCTGC 
ACATTGAATG 
TCCACCTCTG 
ATGACACAAT 
TTTGTATTAT 
TCTTGAATCT 
ACTAATAAAA 



AGTTAATGAG 
GAAGTTTTGC 
CATGGCTCGG 
GCAGAACCAC 
AGAAGGGGCA 
GAGACTGCTC 
GGAAGATTCG 
ATTAAACGCA 
GAGAGACTAC 
GGCCTAGTTG 
AGTGCTGCTG 
TGAACTCCAG 
TTTTTCCCCT 
TCTCTGAGGT 
TTTTCTGCCA 
GCTTGGTTTG 
TCTAGGAAAC 
GGTAAGCAGA 
TTCTCCTTGA 
GAATAACACC 
GTTGTACATC 
AAAATAATTT 
ATCTAAAAGG 



CAAATTTAAG 
TATTACACAA 
GCAGGAGCCA 
CATCCCTGGG 
GCTTCCGGAT 
GCCAGTCTCT 
GGCATGTGAG 
TTTGCATTTT 
CCTAGTCTTT 
CTGAGAGGGG 
TGGAGCTAGG 
TCAAGTTGAG 
CCTAACAAAG 
GCAAAGAATG 
GGCAGTCACC 
GACCACAGTC 
AGTTTAAGAA 
CAGGCCATGA 
ACAGCTTCCC 
TAGTCATAGA 
ATTAAAGATC 
GCTAACTATT 
TAAAAAAAAA 



GAGATGGGCT 
CAATGACCAG 
GCTGAGACCA 
AGGCCCTGCA 
TTTCTTTTGG 
GTGAGCCTAG 
TGCCCCCAGA 
GAGAAGTGTC 
CTGGGGTGTT 
CTGGGGAGAT 
TGCTTCCCCC 
TTCAAGTGAA 
TCTCATAGTG 
CACTTTTCCC 
ACGCTTCCCT 
CTCTGCTACC 
ATCATT6GCC 
TTTAGTTGGC 
CTCCAGCCCA 
AATCAGTCTC 
TAAATACAAA 
TTGATTCTTC 
AAAAAAAAAA 



35 



BLAST Results 



30 No BLAST result 



Medline entries 



No Medline entry 



40 



Peptide information for frame 1 



ORF from 37 bp to 15H2 bpi peptide length: 502 
Category: putative protein 
45 Classification: unclassified 



1 MASKKLGADF 
51 EVflYDFSLEK 
101 KMSFSKTHST 

50 151 KADFNLADFE 
201 PRGGSGSVLfl 
251 KVSLPPIPAV 
301 SLKPSTUSSA 
351 EESSPPNTGP 

55 M01 NMGYSYECVL 
M51 (3CSEEKMMEF 
501 AS 



HGTFSYLDDV 
KTIEUAEEIK 
ATMPPPINPI 
CEEDPFDNLE 
DEEVLASLER 
SNIKSLSFPK 
SELNGHHTLG 
TVTPPNFSVS 
RAMKKKGENI 
LflLMSKFKEM 



PFKTGDKFKT 
KIEEAEREAE 
LASLGJHNSIL 
LKTIDEKEEL 
ATLDFKPLHK 
LDSDDSNflKT 
LSALNLDSGT 
(SVPNMPSCPfl 
EfllLDYLFAH 
GFELKDIKEV 



PAKVGLPIGF 
CKIAEAEAKV 
TPTRVSSSAT 
RNILVGTTGP 
PNGFITLPflL 
AKLASTFHST 
EMPALTSSUM 
AYSELfiMLSP 
GflLCEKGFDP 
LLLHNNDUDN 



SLPDCLdVVR 
NSKSGPEGDS 
KdKVLSPPHI 
IMAOLLDNNL 
GNCEKMSLSS 
SCLRNGTFflN 
PSLSVLSVCT 
SERflCVETVV 
LLVEEALEMH 
ALEDLMARAG 
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BLASTP hits 

5 No BLASTP hits available 

Alert BLASTP hits for DKFZphtes3_lDnlDn frame 1 
No Alert BLASTP hits found 

10 

Pedant information for DKFZphtes3_10nlDi frame 1 
Report for DKFZphtes3_lDnl0.1 

15 

ELENGTHH 50B 
imi SSDfl3-7fl 
EpIJ 5.02 
20 EBLOCKSJ PR010S3D 
IBLOCKSJ BL013DLB 
CKIO All_Alpha 

IKUl L0ld_C0flPLEXITY fl.57 V. 

25 

SEC! MASKKLGADFHGTFSYLDDVPFKTGDKFKTPAKVGLPIGFSLPDCLflVVREVlSYDFSLEK 

SEG xx 

PRD cccccccccccccccccccccccccccccccccccccccccccchhhhhhhhhhcccchh 

30 SEI2 KTIEUAEEIKKIEEAEREAECKIAEAEAKVNSKSGPEGDSKflSFSKTHSTATPIPPPINPI 

SEG xxxxxxxxxxxxxxxxxxxxxxxxxxxxx 

PRD hhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhcccccccccccccccccccccccccchhh 

SEC LASL<2HNSILTPTRVSSSATKl2KVLSPPHIKADFNLADFECEEDPFDNLELKTIDEKEEL 

35 SEG 

PRD hhhhcccccccccccccchhhhhcccccchhhhhcccccccccccccccccchhhhhhhh 

SE(2 RNILVGTTGPIMAG?LLDNNLPRGGSGSVL<2DEEVLASLERATLDFKPLHKPNGFITLP(2L 

SEG 

40 PRD hhhhccccchhhhhhhhcccccccccccchhhhhhhhhhhhhcccccccccccccccccc 

SEfl GNCEKflSLSSKVSLPPIPAVSNIKSLSFPKLDSDDSNflKTAKLASTFHSTSCLRNGTFCJN 

SEG 

PRD ccccccccccccccccccccccccccccccccccccchhhhhhhhhcccccccccccccc 

45 

SE(J SLKPSTflSSASELNGHHTLGLSALNLDSGTEHPALTSSflHPSLSVLSVCTEESSPPNTGP 

SEG xxxxxx 

PRD ccccccccccccccccccccceeecccccccccccccccccceeeeeeeccccccccccc 

50 SEfl TVTPPNFSVSdVPNMPSCPfiAYSELflMLSPSERCKVETVVNMGYSYECVLRAtlKKKGENI 

SEG xxxxxx 

PRD cccccccccccccccccccchhhhhhhcccccchhhhhhhccccchhhhhhhhhhccchh 

SEfl EfllLDYLFAHGflLCEKGFDPLLVEEALEMHflCSEEKMMEFLflLMSKFKEIlGFELOIKEV 

55 SEG 

PRD hhhhhhhhhhhccccccccchhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhh 

SEA LLLHNNDdDNALEDLHARAGAS 
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WO 01/98454 

SEG 

PRD hhcccccchhhhhhhhhhhccc 



PCT/IB01/02050 



5 (No Prosite data available for DKFZphtes3_10nlO • 1 ) 

(No Pfam data available for DKFZphtes3_lQnl0-l) 
DKFZphtes3_llal7 



10 

group: transmembrane protein 

DKFZphtes3_llal? encodes a novel MEfl amino acid protein without 
15 similarity to known proteins- 

The novel protein contains E transmembrane regions and one 
leucine zipper- The protein is ubiquitously expressed with higher 
abundance in stomachi brain and testis. 
20 No informative BLAST resultsi No predictive prositei pfam or SCOP 
motif e- 

The new protein can find application in studying the expression 
profile of testis-specific genes and as a new marker for 
25 testicular cells- 



unknown protein 

30 Pedant: TRANSMEMBRANE 5 

perhaps differential polyadenylation 

Sequenced by diagen 

35 Locus: unknown 

Insert length: 2511 bp 

Poly A stretch at pos- 2570i polyadenylation signal at pos- ESMfl 

40 

1 CTCTCCTGCG CCCTCTGGAG GAAGTGAGAA 
51 CCGCCTGGTA TCTGGGCTCC AGGCCACCGA 

101 GAGCCCTTAG CACACACCTC CCCCACAGGT 

151 ACCTGCAGCC GTGGCGGTAC GCGCCTGACA 
45 201 TCCCAGCCCC GGTGTGTGTC GGAGAAATGG 

E51 CCTGCTGATG TACACCAAGT TGTTTGTGGG 

301 GCACAGACCT GGTCAGCCCC AAGCACGCGC 

351 AAAGTCTTTG CCCAGCCCAA CCTGGCTGAG 

i»01 GCTATTCCTG GAGCCAGAGC TGGTCATCCC 
50 MSI TCACGGCCCC CACATTCACT GGGAGCTTCC 

501 GTCACTGATG CCTCCTTCAA GGTGAAGAGC 

551 CCAGGACTGC AAGTACACCC CGATGTTTGG 

1.01 TCCTGCGCCT CGCTCAGCTC ATCACACAGG 

L.51 ATCTCCGACC AGTGTGCGGA GAGCCCGGCT 
55 7D1 GCTGGGCTTT AGCTCCATGG ACACCAATGG 

751 TGGACGAGAT GGGGCAAGAC AGTGTCCGGA 

flQl AAGGCCCTGG AGTACCTGCG CCAGATATTC 

651 CAGGCAGTTC ACACTCGCCT TGGGCACCAC 



GAGTCAGTCC 
GTATTTGGCC 
CCTGGAGATG 
AGCAGGCTCC 
GCACCCTTTG 
CTTTCTGAAC 
TCATGGTGTT 
ATGATTCAGA 
CCACCGCCAG 
TGTCACCCTG 
CACGTCTACA 
GCCCGAGGCC 
CCAAACACAC 
GGCCACTCCT 
CTCCTACACA 
AGACAGATGA 
CGGCTCAGCG 
CCAGGATGAG 



CACCCAGCTG 
CCCAGCCACG 
TGGCTGAGCT 
GGGCAGCGAC 
TCCAGGAGAA 
CGCGCGCTCC 
CCGAGTGGCC 
AAGGTGAGCA 
CACCGACTCT 
GCCACCAGCG 
GCCTGGAGGG 
CGCACCCTGG 
AGCCAAGTCC 
TCCTCTCATG 
GCCAACGACC 
ATACCTGGAG 
AAGCGCAGCT 
AATGGAAAAA 
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101 AGCAACTCCC CGACTGCATC 
151 CTGGGGCGGT ACCAGATCAT 
1001 CCAGGGGGAC CCGGAGCTGC 
10S1 TGGTCCGCAC ACTCTTTAGG 
5 1101 GGACAGATGG CGGCTCTGTG 
1151 TCGCTACCAC CTCACAGAAC 
1201 CTGTGGGGCG GAGGCAGGTG 
1SS1 CTGCGCTTCC TGGGCAGTTA 
1301 CTTCGTGGCC TCTCTGTTCT 

10 1351 TCACCCTGGG CTATGTCCTC 
moi CGGGGGAAGC TGCACCAGCC 
1451 TGGAGGGATT TGCCACACAG 
1501 GCCCCTCCAG GAGGGAACAC 
1551 CTGCAGCCGC AGAGGCATCT 

15 ltOl GTGGGCCCCA GCAAAGGAGT 
lb51 GACAGCGCAG AGCTCAGCGC 
1701 CGGCCAAGCC AGCTCTCGGG 
1751 TGCTGCACCA AGCTTGGGAG 
lflOl CTCCCCACTG GCTGGCCTTG 

20 1651 CTCTCTGTGT GGGACCAGGA 
1=101 ATCAAAGTTT CTAGAGTTGT 
nSl TTCACTGTGA GGGGCGTTCT 
5001 CCGTAGAATT CCATGTTTCA 
2051 CTTCACCGCA GACCCCAAGC 

25 2101 ACATCGCGGA CCCCTGTGCC 
2151 TTGGGGACAC TGCTGGGTCG 
2201 GCAGCCAAAG ATGGTCAGAA 
2251 AAAGACATGG CAACCGTTCA 
2301 ACAGATTCTC TGACAGAAAC 

30 2351 ACAGGCGACA TGCGAGGGAG 
2401 AGCCTTTGTT TTCGGTGTGG 
2451 AATTGTAAAT GTTGGGCTTT 
2501 TTTTAGGGCC TGTGACCAAA 
2551 TAAACATGTC CTTGCTTCTG 

35 



GTGGGTGAGG ACGGACTCAT CCTTACGCCC 
CAATGGGCTG CGAAGGTTTG AAATTGAGTA 
AGCCCATCCG GAGCTATGAG ATCGCCAGCT 
CTGTCGTCTG CCATCAACCA CAGATTTGCA 
TTCCCGGGAT GACTTCCTCG GCAGCTTCTG 
CTGGGCTGGC CAGCAGGCAC CTGCTGAGCC 
GCCGGCCACA CCCGCGGCCC CAGGCTCAGC 
CCGGACGCTG GTCTCGCTGC TGCTGGCCTT 
GCGTCGGGCC CCTCCCATGC ACGCTGCTGC 
TACGCCTCTG CCATGACACT GCTGACCGAG 
CTGAAGGTGT CAGCTGCCTT CAGAGCAGGC 
CCCCACCCTT GGGCTGAGAG GACCTGGGAA 
GGTCATCCTC GGGCTTCTGG AGCGGGGTTC 
GGAGGAAACG CAACCAAGAA AGGAAGGCAG 
AGCTGCCAGG GCTCAACAGC TACGCTCTGT 
CGGCCTTTCC CTCCCTCCGC CAAGGACTCA 
GCCTTTTTTC CAGTGCCCAT TTGGCTACTC 
CCAGCCTGCC AACAGCCACC TGGGCCTGGC 
AGGTTGGCAG AGTGGGTTGT GGCGCTTCCT 
CAGTGGCTTA AGTCTCCACT CCAGGAAAGA 
GAGAAAACCA GAGAGTGGCT GTCCTGATTC 
TCATGTTCTC CCAGCTGTTC CAAGACTGGG 
GGAGCCTAAG ACCCTCCCAG AGCCCAGGGG 
CATTGAGCAC ATCACCCAAA GCAGTGGCCA 
TTGTCACAGA TGGGTGCTGG TCCTCAGGCG 
ATGGGGTCGG ATTCTGCCAG TTTCTGCTCT 
GCATTGTCAC TTCAGTAACA TCAAGTGCTC 
GTGGTACTTA AGTATTCAAA ATATACAACT 
CAGCACGGGG TCTTCACCTT CATTCACCCC 
AACAGCATCT CAGTGGTGAT TTCCAAACCA 
GGTTTTGGGG GTTTGCTTTA ATGTTTTTGA 
TTATTTTGAT GTAAACTGAG AATAATGGCA 
AATGAAGCTT GTAACGACCA TGGATCTGAA 
AAAAAAAAAA AAAAAAAAAA A 



BLAST Results 



40 Entry AF052134 from database EP1BLNEU : 
Homo sapiens clone 235S5 mRNA sequence- 
Score = 57b5-. P = 2.16-254-. identities = 1155/llSb 
3' UTR 



45 



Medline entries 



50 No Medline entry 



Peptide information for frame 3 
55 

ORF from 13A bp to 1421 bpi peptide length: 426 
Category: putative protein 



-282- 
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Classification: Transmembrane proteins unclassified 
Prosite motifs: LEUCINE_ZIPPER (10>4<-M25) 



5 1 MULSYLdPUR YAPDKflAPGS DSflPRCVSEK UAPFVflENLL MYTKLFVGFL 

SI NRALRTDLVS PKHALMVFRV AKVFAfiPNLA EMIflKGEflLF LEPELVIPHR 

1D1 (2HRLFTAPTF TGSFLSPUPP AVTDASFKVK SHVYSLEGflD CKYTPMFGPE 

151 ARTLVLRLAfl LITtfAKHTAfC SISDflCAESP AGHSFLSULG FSSMDTNGSY 

SD1 TANDLDEMGfl DSVRKTDEYL EKALEYLRCI FRLSEA(JLR(J FTLALGTT(2I> 

10 551 ENGKKflLPDC IVGEDGLILT PLGRY<2IING LRRFEIEYflG DPELflPIRSY 

301 EIASLVRTLF RLSSAINHRF AGflMAALCSR DDFLGSFCRY HLTEPGLASR 

351 HLLSPVGRRfi VAGHTRGPRL SLRFLGSYRT LVSLLLAFFV ASLFCVGPLP 

M01 CTLLLTLGYV LYASAMTLLT ERGKLHflP 

15 



BLASTP hits 

No. BLASTP hits available 

20 

Alert BLASTP hits for DKFZphtes3_llal?-, frame 3 
No Alert BLASTP hits found 
25 Pedant information for DKFZphtes3_llal7n frame 3 



Report for DKFZphtes3_llal?.3 

30 

ILENGThO MSfl 
CMU3 H&2m.73 

Epu a. is 

EPROSITEJ LEUCINE_ZIPPER 1 
35 EKliU TRANSMEMBRANE S 

EKIiO L0ld_C0MPLEXITY 7-Mfl '/. 

SE<2 MWLSYL(2PURYAPDKc2APGSDS(JPRCVSEKIi)APFV<2ENLLMYTKLFVGFLNRALRTDLVS 
40 SEG 

PRD cccccccccccccccccccccccccccccccchhhhhhhhhhhhhhhhhhhhhhhhhccc 
MEM 

SE(J PKHALMVFRVAKVFA<2PNLAEMI(2KGEi2LFLEPELVIPHR<2HRLFTAPTFTGSFLSPlilPP 
45 SEG 

PRD cchhhhhhhhhhhhcccchhhhhhhccceeeccceeeccccccccccccccccccccccc 
MEM 

SE<2 AVTDASFKVKSHVYSLEGi2DCKYTPMFGPEARTLVLRLA(3LIT<2AKHTAICSISD<2CAESP 
50 SEG 

PRD cccccccccccceeeccccccccccccccchhhhhhhhhhhhhhhhcccccccccccccc 
MEM 

SEfl AGHSFLSULGFSSMDTNGSYTANDLDEMG<2DSVRKTDEYLEKALEYLRl2IFRLSEA(2LR(2 
55 SEG 

PRD ccceeecccccccccccccccccccccccccccccchhhhhhhhhhhhhhhhhhhhhhhh 
MEM 
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SEU FTLALGTTl3DENGKK(2LPDCIVGEDGLILTPLGRY(3IINGLRRFEIEY<2GDPEL(2PIRSY 

SEG 

PRD hhhhhhccccccccccccceeecccccccccccceeeecchhhhheeecccccccccchh 

MEM 

SE<2 EIASLVRTLFRLSSAINHRFAGflMAALCSRDDFLGSFCRYHLTEPGLASRHLLSPVGRRfi 

SEG 

PRD hhhhhhhhhhhhhhhhhhhhhhhhhhhhhhcccccceeeeeeccccchhhhhcccccccc 

riEn 

SEU VAGHTRGPRLSLRFLGSYRTLVSLLLAFFVASLFCVGPLPCTLLLTLGYVLYASAMTLLT 

SEG xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx 

PR]) cccccccccccccccccchhhhhhhhhhhhhhhccccccchhhhhhhhhhhhhhhhhhhh 

mem mii 11 mm nnri nun nil mm nri nnnnnnnnnnMMMnMnM- 



15 



SE<3 
SEG 
PRD 
MEM 



ERGKLHCP 



hhhccccc 



20 



Prosite for DKFZphtes3_llal7 • 3 



25 



PSDDOS^ 



LEUCINE_ZIPPER 



PD0C0QD2T 



(No Pfam data available for DKFZphtes3_llal?.3) 
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5 group: signal transduction 

DKFZphtes3_llc22 encodes a novel 462 amino acid protein with 
partial similarity to mouse PC32b- 

10 The novel protein contains UD-repeats- UD-repeat proteins are 

known as regulatory elements in a large variety of pathways- The 
repeats form a propeller like strcture-i which serves as a 
platform for protein/protein interaction. The new protein is 
ubiquitously expressed-) indicating that it takes an essential 

15 regulatory function in the cell- 

The new protein can find application in modulating/blocking of 
regulatory pathways- 

20 

similarity to mouse PC32b 
perhaps complete cds- 

contains UD-Repeats : cf. BLASTX-S37bm 
25 perhaps differential polyadenylation 

Sequenced by fliagen 

Locus: /map="lq23. 2-24-3" 

30 

Insert length: n52 bp 

Poly A stretch at pos- lT32n polyadenylation signal at pos- 1112 



35 1 GAAGCA AGTG AGGTTGCACA AAGCAATAGA GGACGAGGAA GATCTCGACC 

SI CAGAGGTGGA ACAAGTCAAT CAGATATTTC AACTCTTCCT ACGGTCCCAT 

101 CAAGTCCTGA TTTGGAAGTG AGTGAAACTG CAATGGAAGT AGATACTCCA 

151 GCTGAACAAT TTCTTCAGCC TTCTACATCC TCTACAATGT CAGCTCAGGC 

201 TCATTCGACA TCATCTCCCA CAGAAAGCCC TCATTCTACT CCTTTGCTAT 

40 251 CTTCTCCAGA TAGTGAACAA AGGCAGTCTG TTGAGGCATC TGGACACCAC 

301 ACACATCATC AGTCTGATTC TCCTTCTTCT GTGGTTAACA AACAGCTCGG 

351 ATCCATGTCA CTTGACGAGC AACAGGATAA CAATAATGAA AAGCTGAGCC 

401 CCAAACCAGG GACAGGTGAA CCAGTTTTAA GTTTGCACTA CAGCACAGAA 

451 GGAACAACTA CAAGCACAAT AAAACTGAAC TTTACAGATG AATGGAGCAG 

45 SD1 TATAGCATCA AGTTCTAGAG GAATTGGGAG CCATTGCAAA TCTGAGGGTC 

551 AGGAGGAATC TTTCGTCCCA CAGAGCTCAG TGCAACCACC AGAAGGA6AC 

bOl AGTGAAACAA AAGCTCCTGA AGAATCATCA GAGGATGTGA CAAAATATCA 

b51 GGAAGGAGTA TCTGCAGAAA ACCCAGTTGA GAACCATATC AATATAACAC 

701 AATCAGATAA GTTCACAGCC AAGCCATTGG ATTCCAACTC AGGAGAAAGA 

50 751 AATGACCTCA ATCTTGATCG CTCTTGTGGG GTTCCAGAAG AATCTGCTTC 

flOl ATCTGAAAAA GCCAAGGAAC CAGAAACTTC AGATCAGACT AGCACTGAGA 

flSl GTGCTACCAA TGAAAATAAC ACCAATCCTG AGCCTCAGTT CCAAACAGAA 

=101 GCCACTGGGC CTTCAGCTCA TGAAGAAACA TCCACCAGGG ACTCTGCTCT 

151 TCAGGACACA GATGACAGTG ATGATGACCC AGTCCTGATC CCAGGTGCAA 

55 1001 GGTATCGAGC AGGACCTGGT GATAGACGCT CTGCTGTTGC CCGTATTCAG 

1051 GAGTTCTTCA GACGGAGAAA AGAAAGGAAA GAAATGGAAG AATTGGATAC 

1101 TTTGAACATT AGAAGGCCGC TAGTAAAAAT GGTTTATAAA GGCCATCGCA 

1151 ACTCCAGGAC AATGATAAAA GAAGCCAATT TCTGGGGTGC TAACTTTGTA 
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1201 ATGAGTGGTT CTGACTGTGG CCACATTTTC ATCTGGGATC GGCACACTGC 
1251 TGAGCATTTG ATGCTTCTGG AAGCTGATAA TCATGTGGTA AACTGCCTGC 
1301 AGCCACATCC GTTTGACCCA ATTTTAGCCT CATCTGGCAT AGATTATGAC 
1351 ATAAAGATCT GGTCACCATT AGAAGAGTCA AGGATTTTTA ACCGAAAACT 
5 1M01 TGCTGATGAA GTTATAACTC GAAACGAACT CATGCTGGAA GAAACTAGAA 
mSl ACACCATTAC AGTTCCAGCC TCTTTCATGT TGAGGATGTT GGCTTCACTT 
1SD1 AATCATATCC GAGCTGACCG GTTGGAGGGT GACAGATCAG AAGGCTCTGG 
1551 TCAAGAGAAT GAAAATGAGG ATGAGGAATA ATAAACTCTT TTTGGCAAGC 
IbOl ACTTAAATGT TCTGAAATTT GTATAAGACA TTTATTATAT TTTTTTCTTT 

10 lb51 ACAGAGCTTT AGTGCAATTT TAAGGTTATG GTTTTTGGAG TTTTTCCCTT 
1701 TTTTTGGGAT AACCTAACAT TGGTTTGGAA TGATTGTGTG CATGAATTTG 
1751 GGAGATTGTA TAAAACAAAA CTAGCAGAAT GTTTTTAAAA CTTTTTGCCG 
IflOl TGTATGAGGA GTGCTAGAAA ATGCAAAGTG CAATATTTTC CCTAACCTTC 
IflSl AAATGTGGGA GCTTGGATCA ATGTTGAAGA ATAATTTTCA TCATAGTGAA 

15 nOl AATGTTGGTT CAAATAAATT TCTACACTTG CCAAAAAAAA AAAAAAAAAA 

nsi AA 



BLAST Results 

20 

Entry HS702J1 C J from database EHBL : 

Human DNA sequence *** SEQUENCING IN PROGRESS *** from clone 
702J11 

25 Score = 20M3-. P = 5-ae-252-. identities = M25/MM5 
10 exons matching Bp 31b-n32 

Entry HS53blMA from database EHBL : 
human STS WI-b3M7. 
30 Score = 1203-. P = 1.5e-H7-. identities = 2M7/252 

Entry HS703Hm from database EMBLNEbl: 

Human DNA sequence from clone 703H1M on chromosome lq23.2-2 t 4-3 
Score = 1307-. P = l-le-51-, identities = 2b3/2bS 
35 2 exons matching Bp 1-31L 



Medline entries 

40 - 

c 1302b3fi3: 

Bergsagel PL-i Timblin CRi Eckhardt L-. Laskov Rt Kuehl UPl.i 
Sequence and 

45 expression of a murine cDNA encoding PC32bi a novel 

gene expressed in plasmacytomas but not normal plasma cells- 
Oncogene 

1=^2 Octi7(lD) :2DS1-tH 

50 



Peptide information for frame 1 



ORF from 133 bp to 157fi bpi peptide length: Mfl2 
Category: similarity to known protein 
Classification: Protein management 
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WO 01/98454 
Prosite motifs: MYB_1 (llO-llfl) 



PCT/1B01/02050 



1 NEVDTPAEflF LflPSTSSTMS AflAHSTSSPT ESPHSTPLLS 

5 51 EASGHHTHHfl SDSPSSVVNK (3LGSHSLDE(3 (2DNNNEKLSP 

101 LHYSTEGTTT STIKLNFTDE USSIASSSRG IGSHCKSEG<2 

151 (2PPEGDSETK APEESSEDVT KYflEGVSAEN PVENHINITfl 

B01 SNSGERNDLN LDRSCGVPEE SASSEKAKEP ETSDfiTSTES 

251 P<2F(JTEATGP SAHEETSTRD SALiJDTDDSD DDPVLIPGAR 

10 301 AVARIUEFFR RRKERKEMEE LDTLNIRRPL VKMvYKGHRN 

351 UGANFVMSGS DCGHIFIUDR HTAEHLMLLE ADNHVVNCLd 

401 SGIDYDIKIIil SPLEESRIFN RKLADEVITR NELtlLEETRN 

451 RULASLNHIR ADRLEGDRSE GSGUENENED EE 

15 

BLASTP hits 

No BLASTP hits available 

20 

Alert BLASTP hits for DKFZphtes3_Hc22-. frame 1 

TREMBLNEU : HSDbb31_l gene: "H32b"i Human (H3Hb) mRNAi complete 
cds.-. N 

25 = li Score = 2?fl, P = 4e-22 

PIR:S37b=14 gene PC3Bb protein - mouse-. N = 1-. Score = 2b5-i P = 
B.le-20 

30 PIR:T05b7b hypothetical protein F20M13.40 - Arabidopsis thaliana-. 
N = 

1-. Score = 240-, P = b-3e-lfi 



35 >TREMBLNEU:HS0bb31_l gene: "H32b"i Human (H32b) mRNAi complete 
cds • 

Length = 5=17 

HSPs: 

40 

Score = 27fl (41-7 bits)-, Expect = 4-0e-22-, P = 4-0e-22 
Identities = b3/14fi (42X)-, Positives = =14/143 (b3X) 

fluery: 335 YKGHRNSRTIUKEANFWG-- 
45 ANFVMSGSDCGHIFIUDRHTAEHLIILLEADNH-VVNCLflP 3=11 

YKGHRN+ T +K NF+G + FV+SGSDCGHIF+U++ + + + +E D 

VVNCL+P 

Sbjct: 42fl YKGHRNNAT- 

VKGVNFYGPKSEFvVSGSDCGHIFLIdEKSSCflllflFMEGDKGGVVNCLEP 4flb 

50 

(Juery: 3=i2 HPFDPILASSGIDYDIKIUSPLEESRIFNRKLADEVITRNELriLEE- 
TRNTITVPASFML 450 

HP P+LA+SG+D+D+KIU+P E+ L 1 VI +N+ +E + + 

+ S ML 

55 Sbjct: 467 HPHLPVLATSGLDHDVKIUAPTAEASTELTGLKD- 
VIKKNKRERDEDSLHflTDLFDSHriL 545 

fluery: 451 RPILASLNHIRADRLEGD-RSEGSGfiENENEDE 4fll 
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SPDSEflRflSV 
KPGTGEPVLS 
EESFVPflSSV 
SDKFTAKPLD 
ATNENNTNPE 
YRAGPGDRRS 
SRTMIKEANF 
PHPFDPILAS 
TITVPASFML 



WO 01/98454 



PCT/1B01/02050 



10 



55 



L ++H+R R R G G + + DE 

Sbjct: 51b UFL — HHHLRflRRHHRRliJREPGVGATDADSDE 575 



Pedant information for DKFZphtes3_llc22-. frame 1 
Report for DKFZphtes3_Hc22 . 1 



CLENGTHJ 4B2 
EMIO 53470.12 
[pi] 4-72 

CHOMOLJ PIR:T041bl hypothetical protein T12J5-10 - 

15 Arabidopsis thaliana 2e-22 

EFUNCATJ 30-01 organization of intracellular transport vesicles 
ICS • cerevisiae-. YDLmScJ Me-DS 

CFUNCAT3 08-07 vesicular transport (golgi network-, etc.) ICS • 

cerevisiae-. YDL145cJ Me-05 
20 CFUNCAT3 T1 unclassified proteins ES- cerevisiae-. YCL031w3 

2e-04 

ESUPFAPD ti)D repeat homology 4e-21 
EPROSITEJ MYB_1 1 
EKliO Alpha_Beta 
25 EKLO LOIil_C0l1PLEXITY 17-01 * 

SE(2 l1EVDTPAEflFLtJPSTSSTnSAl3AHSTSSPTESPHSTPLLSSPI>SE(2R(2SVEASGHHTHH(3 

SEG xxxxxxxxxxxxxxxxxxxxx 

30 PRD cccccceeeeecccccceeeeeeccccccccccccceeecccccchhhhhhccccceeec 

SE<3 SDSPSSVVNK(2LGSnSLDE<3<2DNNNEKLSPKPGTGEPVLSLHYSTEGTTTSTIKLNFTDE 

SEG 

PRD ccccceeeeecccccccccccccccccccccccccccceeeeccccccccccceeeeccc 

35 

SEfl WSSIASSSRGIGSHCKSEG(2EESFVP<2SSV<2PPEGDSETKAPEESSEDVTKY<3EGVSAEN 

SEG • xxxxxxxxxxxx - 

PRD cccccccccccccccccccceeeeeccccccccccccccccccccccccccccccccccc 

40 SEC PVENHINIT<3SDKFTAKPLDSNSGERNDLNLDRSCGVPEESASSEKAKEPETSD(3TSTES 

SEG xxxxxxxxxxxxxx - • - - xxxxx 

PRD ccceeeeeecccccccccccccccccccccccccccccchhhhhhhhccccccccccccc 

SEA ATNENNTNPEP(!F<2TEATGPSAHEETSTRDSAL(JDTDDSDDDPVLIPGARYRAGPGDRRS 

45 SEG xxxxxxxx — - xxxxxxxx 

PRD cccccccccccceeeeeccccccccccccccccccccccccccccccccccccccccchh 

SE<3 AVARIlJEFFRRRKERKEMEELDTLNIRRPLVKIIVYKGHRNSRTrilKEANFIiJGANFVilSGS 

SEG xxxxxxxxxxxxxx 

50 PRD hhhhhhhhhhhhhhhhhhhhhhhhccccceeeeeeccccccceeeeccccccceeeeccc 

SEtJ DCGHIFIUDRHTAEHLMLLEADNHVVNCLflPHPFDPILASSGIDYDIKIWSPLEESRIFN 

SEG 

PRD ccceeeeeecchhhhhhhhhcccceeeecccccccceeecccccceeeecccchhhhhhh 



SE<2 RKLADEVITRNELMLEETRNTITVPASFMLRMLASLNHIRADRLEGDRSEGSGdENENED 

SEG 

PRD hhchhhhhhhhhhhhhhhhcceeecchhhhhhhhhhchhhhhhccccccccccccccccc 
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SE(2 EE 

SEG • - 

PRD cc 

5 



Prosite for DKFZphtes3_llc2E - 1 
10 PSDDD37 mD-»»n f1YB_l PD0CD0037 



(No Pfam data available for DKFZphtes3_llc22 - 1 ) 



-289- 
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DKFZphtes3_lld21 



PCT/1B01/02050 



5 group: signal transduction 

DKFZphtes3_lld21 encodes a novel T22 acid protein and contains 
the full coding sequence of the human Nedd-H-like ubiquitin- 
protein ligase- 

10 

The novel protein contains four (tfU domains- The lilU/rspS/UUP 
domain has been shown to bind proteins with particular proline- 
motifs-i and thus resembles somewhat SH3 domains* It is 
frequently associated with other domains typical for proteins in 
IS signal transduction processes* There is also a ubiquitin-protein 
ligase activity reported. The protein is believed to play an 
important role in protein-degradation pathways* 

The new protein can find application in diagnosis of diseases due 
20 to unnormal protein degradation like muscular dystrophy or 
multiple sclerosis as well as in modulating the half life of 
specific proteins and in expression profiling* 



25 similarity to Nedd-M-like ubiquitin-protein ligase (Homo sapiens) 
Sequenced by fiiagen 
Locus: unknown 

30 

Insert length: 33fl2 bp 

Poly A stretch at pos. 33b2i polyadenylation signal at pos* 33*15 



35 1 ATTTTGGGAC ATGGCCACTG CTTCACCAAG GTCTGATACT AGTAATAACC 

51 ACAGTGGAAG GTTGCAGTTA CAGGTAACTG TTTCTAGTGC CAAACTTAAA 

1D1 AGAAAAAAGA ACTGGTTCGG AACAGCAATA TATACAGAAG TAGTTGTAGA 

151 TGGAGAAATT ACGAAAACAG CAAAATCCAG TAGTTCTTCT AATCCAAAAT 

201 GGGATGAACA GCTAACTGTA AATGTTAC6C CACAGACTAC ATTGGAATTT 

40 251 CAAGTTTGGA GCCATCGCAC TTTAAAAGCA GATGCTTTAT TAGGAAAAGC 

301 AACGATAGAT TTGAAACAAG CTCTGTTGAT ACACAATAGA AAATTGGAAA 

351 GAGTGAAAGA ACAATTAAAA CTTTCCTTGG AAAACAAGAA TGGCATAGCA 

101 CAAACTGGTG AATTGACAGT TGTGCTTGAT GGATTGGTGA TTGAGCAAGA 

-151 AAATATAACA AACTGCAGCT CATCTCCAAC CATAGAAATA CAGGAAAATG 

45 501 GTGATGCCTT ACATGAAAAT GGAGAGCCTT CAGCAAGGAC AACTGCCAGG 

551 TTGGCTGTTG AAGGCACGAA TGGAATAGAT AATCATGTAC CTACAAGCAC 

t.01 TCTAGTCCAA AACTCATGCT GCTCGTATGT AGTTAATGGA GACAACACAC 

b51 CTTCATCTCC GTCTCAGGTT GCTGCCAGAC CCAAAAATAC ACCAGCTCCA 

701 AAACCACTCG CATCTGAGCC TGCCGATGAC ACTGTTAATG GAGAATCATC 

50 751 CTCATTTGCA CCAACTGATA ATGCGTCTGT CACGGGTACT CCAGTAGTGT 

601 CTGAAGAAAA TGCCTTGTCT CCAAATTGCA CTAGTACTAC TGTTGAAGAT 

flSl CCTCCAGTTC AAGAAATACT GACTTCCTCA GAAAACAATG AATGTATTCC 

•=101 TTCTACCAGT GCAGAATTG6 AATCTGAAGC TAGAAGTATA TTAGAGCCTG 

151 ACACCTCTAA TTCTAGAAGT AGTTCTGCTT TTGAAGCAGC CAAATCAAGA 

55 1001 CAGCCAGATG GGTGTATGGA TCCTGTACGG CAGCAGTCTG GGAATGCCAA 

1051 CACAGAAACC TTGCCATCAG GGTGGGAACA AAGAAAAGAT CCTCATGGTA 

1101 GAACCTATTA TGTGGATCAT AATACTCGAA CTACCACATG GGAGAGACCA 

1151 CAACCTTTAC CTCCAGGTTG GGAAAGAAGA GTTGATGATC GTAGAAGAGT 
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1201 TTATTATGTG GATCATAACA CCAGAACAAC AACGTGGCAG CGGCCTACCA 

1251 TGGAATCTGT CCGAAATTTT GAACAGTGGC AATCTCAGCG GAACCAATTG 

13D1 CAGGGAGCTA TGCAACAGTT TAACCAACGA TACCTCTATT CGGCTTCAAT 

1351 GTTAGCTGCA GAAAATGACC CTTATGGACC TTTGCCACCA GGCTGGGAAA 

5 1H01 AAAGAGTGGA TTCAACAGAC AGGGTTTACT TTGTGAATCA TAACACAAAA 

14S1 ACAACCCAGT GGGAAGATCC AAGAACTCAA GGCTTACAGA ATGAAGAACC 

1501 CCTGCCAGAA GGCTGGGAAA TTAGATATAC TCGTGAAGGT GTAAGGTACT 

1551 TTGTTGATCA TAACACAAGA ACAACAACAT TCAAAGATCC TCGCAATGGG 

IbOl AAGTCATCTG TAACTAAAGG TGGTCCACAA ATTGCTTATG AACGCGGCTT 

10 IbSl TAGGTGGAAG CTTGCTCACT TCCGTTATTT GTGCCAGTCT AATGCACTAC 

1701 CTAGTCATGT AAAGATCAAT GTGTCCCGGC AGACATTGTT TGAAGATTCC 

1751 TTCCAACAGA TTATGGCATT AAAACCCTAT GACTTGAGGA GGCGCTTATA 

IfiOl TGTAATATTT AGAGGAGAAG AAGGACTTGA TTATGGTGGC CTAGCGAGAG 

1SS1 AATGGTTTTT CTTGCTTTCA CATGAAGTTT T6AACCCAAT GTATTGCTTA 

15 1101 TTTGAGTATG CGGGCAAGAA CAACTATTGT CTGCAGATAA ATCCAGCATC 

1151 AACCATTAAT CCAGACCATC TTTCATACTT CTGTTTCATT GGTCGTTTTA 

2D01 TTGCCATGGC ACTATTTCAT GGAAAGTTTA TCGATACTGG TTTCTCTTTA 

2051 CCATTCTACA AGCGTATGTT AAGTAAAAAA CTTACTATTA AGGATTTGGA 

21D1 ATCTATTGAT ACTGAATTTT ATAACTCCCT TATCTGGATA AGAGATAACA 

20 2151 ACATTGAAGA ATGTGGCTTA GAAATGTACT TTTCTGTTGA CATGGAGATT 

2201 TTGGGAAAAG TTACTTCACA TGACCTGAAG TTGGGAGGTT CCAATATTCT 

2251 GGTGACTGAG GAGAACAAAG ATGAATATAT TGGTTTAATG ACAGAATGGC 

23D1 GTTTTTCTCG AGGAGTACAA GAACAGACCA AAGCTTTCCT TGATGGTTTT 

2351 AATGAAGTTG TTCCTCTTCA GTGGCTACAG TACTTCGATG AAAAAGAATT 

25 2401 AGAGGTTATG TTGTGTGGCA TGCAGGAGGT TGACTTGGCA GATTGGCAGA 

2451 GAAATACTGT TTATCGACAT TATACAAGAA ACAGCAAGCA AATCATTTGG 

2501 TTTTGGCAGT TTGTGAAAGA GACAGACAAT GAAGTAAGAA TGCGACTATT 

2551 GCAGTTCGTC ACTGGAACCT GCCGTTTACC TCTAGGAGGA TTTGCTGAGC 

2b01 TCATGGGAAG TAAT6GGCCT CAAAAGTTTT GCATTGAAAA AGTTGGCAAA 

30 EbSl GACACTTGGT TACCAAGAAG CCATACATGT TTTAATCGCT TGGATCTACC 

2701 ACCATATAAG AGTTATGAAC AACTAAAGGA AAAACTTCTT TTTGCAATAG 

2751 AAGAGACAGA GGGATTTGGA CAAGAATGAA TGTGGCTTCT TATTTTGGAG 

2601 GAGCTCTTGC ATTTAAATAC CCCAGCCAAG AAAAATTGCA CAGATAGTGT 

2fl51 ATATAAGCTG TTCATTCTGT ACAGTGAATT TTCCGAACCT CTCAAAGTAT 

35 2101 GTTTTCCGTT CTTCCACAGA AATATGCAAA ACAGTTCATC CTTTTCTACT 

2151 TTATTTATTG TTCCCTTGAA ATGACTGACC AGGAAAAAGA TCATCCTTAA 

3001 ATTTTGAAGC AAGTGAGAGA CTTTATTAAA AATACATATA TATCTATATA 

3051 AACATATATG ATAGTGGCTC TAGTTTTATA GAGCTCCAAG TGTATTAAAC 

3101 ATGACAGCCA TTCATTCATA AAGATCTGGA TTTGCTTTAC CTTGTTAATA 

40 3151 TTATCTAGGG GAAAAAGTGC AAATTGCTCC ATGTTCTTCT CTCCCTTATG 

3201 TAACATCTCC TGAGGGTGTT TAGTTGCATG GCTGTTCAGA AAGGTATTAA 

3251 GGGCTTAGGC CAAATCTTAC TTTGAGTATG TTAAAAAAAA AAAAATGCTG 

3301 CTGGCTTTTC TGAAGACAGG TGCTTGAACT TGTCAGTTTG TTTTAAATAA 

3351 ATACAATAGT TGAAAAAAAA AAAAAAAAAA AA 

45 

BLAST Results 



50 No BLAST result 



Medline entries 



55 

17313427: 

Pirozzi G-, PlcConnell SJ n Uveges Ad-i Carter JPI-i Sparks ABt Kay BK-i 
Fowlkes DM.t Identification of novel human UU domain-containing 
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proteins 

by cloning of ligand targets • J Biol Chem m? Jun 
b=,272<23) :14bll-b 



Peptide information for frame 5 



10 

ORF from 11 bp to 277b bpi peptide length: T22 
Category: known protein 
Classification: Protein management 
Prosite motifs: UU_D0HAIN_1 (3S5-360) 
15 ltlli)_I>0MAIN_l (337-412) 
lilU_D0HAIN_l (4b2-4fl7) 
UliU>0MAIN_l (502-527) 



20 1 MATASPRSDT SNNHSGRLdL (2VTVSSAKLK RKKNUFGTAI YTEVVVDGEI 

SI TKTAKSSSSS NPKUDEflLTV NVTPflTTLEF (2VUSHRTLKA DALLGKATID 

1D1 LK(3ALLIHNR KLERVKEiSLK LSLENKNGIA (3TGELTVVLD GLVIEflENIT 

151 NCSSSPTIEI (3ENGDALHEN GEPSARTTAR LAVEGTNGID NHVPTSTLVfl 

201 NSCCSYVVNG DNTPSSPSdV AARPKNTPAP KPLASEPADD TVNGESSSFA 

25 251 PTDNASVTGT PVVSEENALS PNCTSTTVED PPVfiEILTSS ENNECIPSTS 

301 AELESEARSI LEPDTSNSRS SSAFEAAKSR CIPDGCNDPVR flflSGNANTET 

351 LPSGUEfiRKD PHGRTYYVDH NTRTTTUERP fiPLPPGUERR VDDRRRVYYV 

401 DHNTRTTTUfl RPTMESVRNF E(2U<2S(2RNi3L flGAMflflFNflR YLYSASPILAA 

MSI ENDPYGPLPP GUEKRVDSTD RVYFVNHNTK TTfiUEDPRTd GL(JNEEPLPE 

30 SOI GUEIRYTREG VRYFVDHNTR TTTFKDPRNG KSSVTKGGP<2 IAYERGFRWK 

551 LAHFRYLCflS NALPSHVKIN VSRflTLFEDS FflfilMALKPY DLRRRLYVIF 

bDl RGEEGLDYGG LAREUFFLLS HEVLNPMYCL FEYAGKNNYC LfllNPASTIN 

bSl PDHLSYFCFI GRFIAHALFH GKFIDTGFSL PFYKRflLSKK LTIKDLESID 

701 TEFYNSLIUI RDNNIEECGL EHYFSVDHEI LGKVTSHDLK LGGSNILVTE 

35 751 ENKDEYIGLfl TEURFSRGVc! EfiTKAFLDGF NEVVPLi3b)L(3 YFDEKELEVfl 

301 LCGIK3EVDLA DUdRNTVYRH YTRNSKfllllil FlilflFVKETDN EVRflRLLCJFV 

fl51 TGTCRLPLGG FAELMGSNGP (2KFCIEKVGK DTULPRSHTC FNRLDLPPYK 

1D1 SYEflLKEKLL FAIEETEGFG (3E 

40 

BLASTP hits 

No BLASTP hits available 

45 

Alert BLASTP hits for »KFZphtes3_lld21i frame 2 
No Alert BLASTP hits found 
50 Pedant information for DKFZphtes3_lld21 n frame 2 



Report for DKFZphtes3_lld21.2 

55 

ELENGTHD =525 

IMhO IDSbSO-Sfl 

dpi] 5-bO 
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EHOI10L.ni TREI1BL : HSUlbll3_l product: °UUPl"i Homo sapiens 

Nedd-4-like ubiquitin-protein ligase IdUPl mRNA-. partial cds. Q-D 

EFUNCATJ 30.02 organization of plasma membrane ES. cerevisiae-. 

YER12SwJ le-141 

5 EFUNCATJ 11-01 stress response ES. cerevisiae-. YER125wJ le- 

EFUNCATJ Ob- 13. 01 cytoplasmic degradation ES. cerevisiae-. 

YER12SwJ le-mT 

EFUNCATJ 03.10 sporulation and germination ES. cerevisiaei 

10 YER12SwJ le-141 

EFUNCATJ Ob.O? protein modification (glycolsylation-. acylation-. 

myristylation-i palmitylation-. f arnesylation and processing) 

ES. cerevisiaei YER125wJ le-mi 

EFUNCATJ D3*22 cell cycle control and mitosis ES. cerevisiae-. 

15 YDR457wJ le-7fl 

EFUNCATJ *n unclassified proteins ES- cerevisiae-. YJR03bcJ 

?e-3T 

EFUNCATJ 30.03 organization of cytoplasm ES* cerevisiaei 

YKLOlOcJ fle-21 

20 EFUNCATJ 30-10 nuclear organization ES. cerevisiae-. YKL012w3 
be-OS 

EFUNCATJ 04-05-03 mrna processing (splicing) ES. cerevisiae-. 

YKL012wJ be-05 

EFUNCATJ 30-01 organization of cell wall ES. cerevisiae-. 

25 YIROllcJ 3e-04 

EFUNCATJ 30-10 extracellular/secretion proteins ES- cerevisiae-. 

YIROllcJ 3e-04 

EFUNCATJ 01.05*01 carbohydrate utilization ES. cerevisiae-. 

YIROllcJ 3e-04 

30 EBLOCKSJ BP03?4bE 

EBLOCKSJ BP037blG 

EBLOCKSJ BL00514E Fibrinogen beta and gamma chains C-terminal 

domain proteins 

EBL0CKSJ PR0D731B 

35 EBL0C<SJ BPOlSbbC 

EBLOCKSJ BLOllSI UU/rspS/UUP domain proteins 

EBLOCKSJ PR00403B 

EBLOCKSJ PR00403A 

EBLOCKSJ PF00b32B 

40 EBLOCKSJ PFO0b32A 

EECJ b-3-2-11 Ubiquitin — protein ligase le-151 

EPIRKUJ ligase le-151 

EPIRKUJ transmembrane protein 2e-37 

EPIRKUJ leucine zipper 2e-2fl 

45 ESUPFAMJ Ul hi repeat homology le-151 

ESUPFAilJ UD repeat homology 2e-2fi 

ESUPFAMJ ubiquitin ligase homolog le-151 

EPROSITEJ Uli)_DOMAIN_l 4 

EPFAflJ UU/rsp5/WUP domain containing proteins 

50 EPFAMJ C2 domain 

EKIilJ Alpha_Beta 

EKUJ LOUICOMPLEXITY 1-41 '/. 



55 SE<2 FUDMATASPRSDTSNNHSGRLflLflVTVSSAKLKRKKNUFGTAIYTEVVVDGEITKTAKSS 

SEG 

PRD ccccccccccccccccccceeeeeehhhhhhhhhhhhccccceeeeeeeccccceeeecc 
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SE<2 SSSNPKUPE(2LTVNVTPl3TTLEF(2VUSHRTLKADALLGKATIDLK(2ALLIHNRKLERVKE 

SEG 

PRD ccccccccceeeeeccccceeeeeeecchhhhhhhhhhhhhhhhhhhhhhhchhhhhhhh 

5 SE(2 <3LKLSLENKNGIA(2TGELTVVLDGLVIE<3ENITNCSSSPTIEI(3ENGDALHENGEPSART 

SEG 

PRD hhhhhhcccccccccceeeeeecceeeeeeeccccccccceGeecccccccccccccchh 

SE<2 TARLAVEGTNGIDNHVPTSTLVflNSCCSYVVNGDNTPSSPStmARPKNTPAPKPLASEP 

10 SEG 

PRD hhhhhhcccccccccccccccccccccccccccccccccccccccccccccccccccccc 

SEA ADDTVNGESSSFAPTDNASVTGTPVVSEENALSPNCTSTTVEDPPVflEILTSSENNECIP 

SEG 

15 PRD cccccccccccccccccceeeccccccccccccccccccccccccccccccccccccccc 

SEfl STSAELESEARSILEPDTSNSRSSSAFEAAKSR<3PDGCt1DPVR<2<2SGNANTETLPSGIjE<a 

SEG 

PRD ccccccccceeeeccccccccccccccccccccccccccccccccccccccccccccccc 

20 

SE(2 RKDPHGRTYYVDHNTRTTTWERPflPLPPGWERRVDDRRRVYYVDHNTRTTTWflRPTflESV 

SEG xxxxxxxxxxxxx 

PRD ccccccceeeecccccccccccccccccccccccccccceeeeecccccccccccccccc 

25 SE(2 RNFE(JUdJS(2RNflL(JGArifl(2FNflRYLYSASriLAAENDPYGPLPPGUEKRVDSTDRVYFVNH 

SEG 

PRD hhhhhhhhhhhhhhhhhhhcccccccccccccccccccccccccceeeeccccceeeeec 

SE(J NTKTT<2WEDPRT(2GL(2NEEPLPEGUEIRYTREGVRYFVDHNTRTTTFKDPRNGKSSVTKG 

30 SEG 

PRD ccceeeecccccccccccccccccceeeeecccceeeeeccceeeeeccccccccccccc 

SE<2 GP(2IAYERGFRIi)KLAHFRYLC(3SNALPSHVKINVSR(!TLFEDSFt2(3IMALKPYDLRRRLY 

SEG 

35 PRD cccccchhhhhhhhhhhhhhhhhcccccceeeeehhhhhhhhhhhhhhhhcchhhhhhhh 

SE(3 VIFRGEEGLDYGGLAREliJFFLLSHEVLNPflYCLFEYAGKNNYCLfllNPASTINPDHLSYF 

SEG 

PRD hhhccccccccccchhhhhhhhhhhccccccceeeeecccceeeeecccccccccceeee 

40 

SEfJ cfigrfiamalfhgkfidtgfslpfykrhlskkltikdlesidtefynsliuirdnniee 

SEG 

PRD hhhhhhhhhhhhhhhccccccchhhhhhhhhhhhhhcccccccchhhhhheeeeeccccc 

45 SEt2 CGLEMYFSVDNEILGKVTSHDLKLGGSNILVTEENKDEYIGLNTEURFSRGV(2E(2TKAFL 

SEG 

PRD chhhhhhhhhccccceeeeeeeccccceeeeeeccchhhhhhhhhhhhhhhhhhhhhhhh 

SEA DGFNEVVPLflULUYFDEKELEVHLCGI1(3EVDLADUlJRNTVYRHYTRNSKflIIUFIi)(JFVKE 

50 SEG 

PRD hhhhhcccccchhhhhhhhhhhhhhcccccccccccccccccccccchhhhhhhhhhhhh 

SEfl TDNEVRMRLLflFVTGTCRLPLGGFAELMGSNGPflKFCIEKVGKDTblLPRSHTCFNRLDLP 

SEG 

55 PRD hchhhhhhhhhhhccccccccccceeeecccccceeeeeecccccccccccccccccccc 

SE(2 PYKSYEi2LKEKLLFAIEETEGFG<2E 

SEG 
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Prosite for DKFZphtes3_lld21 • 2 



PSDllSI 
PSOllST 
PSOllBT 
10 PSD11ST 



3Sfl->3fli4 
5DS->531 



Uli)_DOHAIN_l 
UU_D0NAIN_1 
Qllil_])OnAIN_l 
Ulil_l>0riAIN_l 



PD0CSDD2D 
PD0C5DO2D 
PDOC5DD20 
PDOC5D020 



15 



Pfam for DKFZphtes3_Hd21 - 2 



25 



30 



HMri_NAME C2 domain 



HtlPI 

20 *LtVrIIeARNLUkri]>HnGfSI>PYVKVdndPdpkl>tkKlilKTkTiUNN.GL 

L V++ +A+ +K++++G+ Y +V +D++ TKT 

+++ + 

fluery 23 LfiVTVSSAKLKRKKNIilFGTA-IYTEVVVDGE 

ITKTAKSSSSS b3 



HfW NPVWNEEeFvFedlPyPdlqrkllLRFaVlilDlilDRFSRBDFIGHCi* 

NP ld+ E+++ + + + L+F+VU + ++ + ++G ++ 

fluery NPKUD-EflLTVN VTPtJTT — LEFflVUSHRTLKADALLGKAT 

101 



HMIi NAME lilU/rspS/UUP domain containing proteins 

35 H MM *LPsGUEeHUl>psGRpUYYIi)NHETkTT(3blEpP* 

LPSGWE+++DP GR+ YY++H+T+TT+WE+P 
fiuery 35>4 LPSGUEflRKDPHGRT-YYVDHNTRTT TWERP 383 

5D • 01 3flb ms 1 31 dkfzphtes3_lld21.2 similarity to 

40 Nedd-M-like ubiquitin-protein ligase (Homo sapiens) 
Alignment to HMM consensus: 
fluery *LPsGUEeHlJDpsGRpblYYUNHETkTTi2lilEpP* 

LP+GUE++ D+ R YY++H+T+TT+U++P 
dkfzphtes3 3flb LPPGIi)ERRVDDRRRV-YYVDHNTRTTTIil<2RP ms 

45 

fiuery **1U 1 31 dkf zphtes3_lld21 -2 similarity to 

Nedd-4-like ubiquitin-protein ligase (Homo sapiens) 

Alignment to HMM consensus: 
HMM *LPsGUEeHUI)psGRpUYYUNHETkTT(3li)EpP* 
50 LP+GUE++ D + R Y++NH+TKTT<2WE+P 

fiuery Ibl LPPGUEKRVDSTDRV-YFVNHNTKTTfiliJEDP M10 

3fl.b2 501 S30 1 31 dkfzphtes3_lld21.2 similarity to 

Nedd-1-like ubiquitin-protein ligase (Homo sapiens) 
55 Alignment to HPlIi consensus: 

fluery *LPsGUEeHlill>psGRpL)YYUNHETkTTl3UEpP* 

LP GUE +++ +G + Y+++H+T+TT+ ++P 
dkfzphtes3 5D1 LPEGWEIRYTREGVR-YFVDHNTRTTTFKDP S3D 
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5 group: testis derived 

DKFZphtes3_llel7 encodes a novel 573 amino acid protein without 
similarity to known proteins- 

10 No informative BLAST resultsi No predictive prosite-. pfam or SCOP 
motife. 



15 



20 



25 



The new protein can find application in studying the expression 
profile of testis-specif ic genes. 



unknown protein 
Sequenced by (Jiagen 
Locus: unknown 
Insert length: BIOS bp 

Poly A stretch at pos. EOflOi polyadenylation signal at pos- BDST 



1 GGCC7GGGGG GCTTCCCTGG GGGGCTTGTC GCCGGGGCCG CCTGGGCTTT 

51 CAGGTCTTCC GAGGCTGACA TTCACGTTTC ATTCTGCCAC ACTCGGGAAC 

101 GGTGATCGGG GAAGCATGGG GATCCGGGAG AAGCACCCAC AAAACTAGCA 

30 151 TCCTCCTGGA GGAGCTCGGG AATAGGATGA GTGATAATCC ACCCAGAATG 

501 GAAGTGTGTC CTTACTGTAA GAAGCCATTT AAACGATTAA AATCCCACTT 

251 GCCATACTGT AAGATGATAG GATCAACCAT ACCTACTGAT CAAAAAGTTT 

301 ATCAGTCCAA GCCAGCTACA CTCCCACGTG CTAAAAAGAT GAAAGGACCA 

351 ATCAAAGATT TAATTAAAGC TAAAGGGAAA GAGTTAGAGA CAGAGAATGA 

35 401 AGAAAGAAAT TCTAAGTTGG TGGTGGACAA ACCAGAACAG ACAGTGAAGA 

451 CCTTTCCACT GCCAGCTGTT GGTTTGGAAA GAGCAGCTAC TACAAAGGCA 

501 GATAAAGACA TCAAGAATCC AATCCAACCA TCCTTCAAAA TGTTAAAAAA 

551 TACTAAACCA ATGACTACTT TCCAAGAAGA AACCAAGGCT CAGTTTTACG 

bOl CATCAGAGAA AACCTCTCCT AAAAGAGAAC TTGCCAAAGA TTTGCCTAAA 

40 t.51 TCAGGAGAAA GTCGATGTAA TCCTTCAGAA GCT6GAGCGT CTTTACTGGT 

701 TGGCTCAATA GAACCTTCTT TGTCAAATCA AGATAGAAAA TATTCCTCAA 

751 CTCTACCTAA TGATGTACAA ACTACCTCTG GTGATCTCAA ATTGGACAAA 

601 ATTGATCCCC AAAGACAGGA ACTTCTAGTA AAATTACTAG ATGTGCCTAC 

B51 TGGTGATTGT CATATTTCTC CAAAGAATGT CAGTGATGGG GTTAAAAGGG 

45 101 TAAGAACATT ATTAAGCAAT GAGAGAGATT CCAAAGGCAG GGATCACCTC 

=151 TCAGGAGTCC CTACTGATGT TACAGTTACT GAGACTCCAG AAAAGAACAC 

1001 AGAATCCCTC ATTTTAAGCC TTAAAATGA6 CTCATTAGGT AAAATCCAAG 

1051 TCATGGAGAA ACAAGAGAAA GGACTTACCC TGGGAGTAGA GACGTGTGGG 

1101 AGCAAAGGAA ATGCAGAGAA AAGTATGTCT GCAACAGAAA AGCAGGAACG 

50 1151 GACTGTCATG AGCCATGGCT GTGAGAACTT CAACACCAGG GATTCAGTCA 

1S01 CAGGAAAGGA GTCTCAAGGG GAAAGACCAC ATTTAAGTTT GTTCATTCCG 

1H51 AGGGAGACGA CTTACCAGTT TCATTCTGTA TCGCAGTCAA GTAGTCAAAG 

1301 TCTTGCCTCT CTAGCTACAA CATTTCTTCA AGAAAAGAAA GCAGAAGCCC 

1351 AGAATCATAA TTGTGTCCCT GATGTAAAGG CATTAATGGA GAGTCCCGAG 

55 1401 GGACAGTTAT CTCTGGAGCC CAAATCTGAT AGTCAGTTCC AAGCATCACA 

1451 CACTGGGTGC CAGAGCCCTT TATGTTCAGC CCAGCGTCAC ACTCCTCAGA 

1501 GCCCCTTCAC CAATCATGCT GCAGCTGCTG GCAGGAAGAC TCTTCGCAGC 

1551 TGCATGGGGC TGGAGTGGTT TCCAGAGCTC TATCCTGGTT ACCTTGGACT 
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IbOl AGGGGTGTTG 
lb51 CACAACTTAT 
1701 TGCAACACCA 
1751 CACAGGATAC 
IflOl AATTGTGCCG 
1A51 GTGGCGAAGA 
nOl AGCACGTTTA 
1151 GCCCTGTGTT 
2001 CATTAAGGGA 
2051 ACTGCTATAA 
2101 AA 



CCAGGGAAGC 
CAGTCCCCAG 
CCATAAGGAA 
TTCGTCCTGT 
ACCCCTGCCC 
CGACTGGGGA 
AGTAGGAGAA 
GCCCACTGAA 
TAGCTTTTCA 
ATAAAGTAGT 



CTCAGTGTTG 
GGGGAAAGAC 
GAGTGGATTC 
GTTGTAGCTG 
TGGAAGAGCA 
TTGCCGCTCT 
GCCTTCGTGA 
TTGCCCTGTA 
GCCCTCAAGG 
ATCACTTGTC 



GAATGCAATG 
TCTCACAAGG 
GGTGGCATCA 
GAGTTTCAGA 
CAGTACCTCC 
AAAACATGTT 
CTTCTCTCTA 
ACACCTAAGT 
TTATCAGGAG 
ATAAAAAAAA 
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ACCCAGAAGC 
CTGGATCAGG 
CTATGCTCTT 
CGTCTGAAAA 
ATGCATTGGT 
TGGATTAGGA 
GTGCCTTCGT 
GTAGTGGTAG 
CATTTGTATC 
AAAAAAAAAA 



15 



No BLAST result 



BLAST Results 



20 



Medline entries 



25 



No Medline entry 



Peptide information for frame 3 



30 ORF from 177 bp to 1615 bpi peptide length: 573 
Category: putative protein 
Classification: no clue 



1 MSDNPPRMEV CPYCKKPFKR LKSHLPYCKfl 

35 51 RAKKMKGPIK DLIKAKGKEL ETENEERNSK 

101 ERAATTKADK DIKNPIfiPSF KflLKNTKPUT 

151 elakdlpksg esrcnpseag asllvgsiep 

201 sgdlkldkid pdrljellvkl ldvptgdchi 

251 dskgrdhlsg vptdvtvtet peknteslil 

40 301 tlgvetcgsk gnaekshsat ekflertvmsh 

351 phlslfipre tty(3fhsvs(2 sssflslasla 

401 kaliiespegu lslepksdsa faashtgcas 

451 agrktlrscm gleufpelyp gylglgvlpg 

501 RLSUGUIRCN TTIRKSGFGG ITMLFTGYFV 

45 551 STVPPCIGVA KTTGDCRSKT CLD 



IGSTIPTDflK 
LVVDKPEUTV 
TFfiEETKAflF 
SLSN<2DRKYS 
SPKNVSDGVK 
SLKMSSLGKI 
GCENFNTRDS 
TTFLdEKKAE 
PLCSAURHTP 
KPflCUNANTfl 
LCCSUSFRRL 



VYflSKPATLP 
KTFPLPAVGL 
YASEKTSPKR 
STLPNDVflTT 
RVRTLLSNER 
(3VHEK(2EKGL 
VTGKESflGER 
AtfNHNCVPDV 
(3SPFTNHAAA 
KPfiLISPflGE 
KKLCRPLPWK 



50 



BLASTP hits 
No BLASTP hits available 

Alert BLASTP hits for DKFZphtes3_llel7, frame 3 
55 No Alert BLASTP hits found 

Pedant information for DKFZphtes3_llel7i frame 3 
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Report for DKFZphtes3_llel? . 3 

5 [LENGTH! 5?3 

cnio b33a=i.fl6 

Eplll T-EM 

EBL0CKS3 BLODQEfl Zinc finger-. CSHE type-, domain proteins 

CKIO Alpha_Beta 

10 EKU3 L0U_C0I1PLEXITY 7-50 Z 

SEfl nSDNPPRNEVCPYCKKPFKRLKSHLPYCKMIGSTIPTDflKVYflSKPATLPRAKKhTGPIK 

SEG 

15 PRD ccccccceeecccccchhhhhhhcccceeeeccccccceeeeeccccchhhhhhhcccch 

SEfl DLIKAKGKELETENEERNSKLVVDKPEflTVKTFPLPAVGLERAATTKADKDIKNPIflPSF 

SEG 

PRD hhhhhhcccchhhhhhhhheeeeccccceeecccccchhhhhhhhhhhcccccccccchh 

20 

SEfl KMLKNTKPHTTFflEETKAflFYASEKTSPKRELAKDLPKSGESRCNPSEAGASLLVGSIEP 

SEG 

PRD hhhhcccccchhhhhhhhhhhhhcccccchhhhhccccccccccccccchhhhhhhcccc 

25 SEfl SLSNflDRKYSSTLPNDVflTTSGDLKLDKIDPflRflELLVKLLDVPTGDCHISPKNVSDGVK 

SEG 

PRD ccccccceeecccccccccccccccccccccchhhhhhhhhccccccccccccccccchh 

SEfl RVRTLLSNERDSKGRDHLSGVPTDVTVTETPEKNTESLILSLKIISSLGKIflVMEKflEKGL 

30 SEG xxxxxxxxxxxx 

PRD hhhhhhhhcccccccccccccccceeeeeccccchhhhhhhhhhccccchhhhhhhhccc 

SEfl TLGVETCGSKGNAEKSMSATEKflERTVMSHGCENFNTRDSVTGKESflGERPHLSLFIPRE 
SEG . 

35 PRD eeeeecccccccchhhhhhhhhhhhhhhcccccccccccccccccccccccceeeeeccc 

SEfl TTYflFHSVSflSSSflSLASLATTFLflEKKAEAflNHNCVPDVKALIIESPEGflLSLEPKSDSfl 

SEG xxxxxxxxxxxxxx 

PRD eeeeeeccccccchhhhhhhhhhhhhhhhhhhccccccchhhhhcccccccccccccccc 

40 

SEfl FflASHTGCflSPLCSAflRHTPflSPFTNHAAAAGRKTLRSCMGLEWFPELYPGYLGLGVLPG 

SEG xxxxxxxxxxxxxxx 

PRD cccccccccccccccccccccccccchhhhhcchhhhhhccccccccccccccceeeccc 

45 SEfl KPflCUNAIITflKPflLISPflGERLSflGUIRCNTTIRKSGFGGITnLFTGYFVLCCSUSFRRL 
SEG xx 

PRD ccccccccccccccccccccchhhhhccccceeeecccccceeeecceeeeeecchhhhh 

SEfl KKLCRPLPUKSTVPPCIGVAKTTGDCRSKTCLD 

50 SEG 

PRD hhhccccccccccccceeeeecccccccccccc 



55 



(No Prosite data available for DKFZphtes3_Hel7-3) 
(No Pfam data available for DKFZphtes3_llel7 • 3) 
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5 group: testis derived 

DKFZphtes3_12dlfl encodes a novel 1170 amino acid protein without 
similarity to known proteins- 

10 The EST-distribution signifies an ubiquitous expression pattern- 
No informative BLAST resultsi No predictive prosite-. pfam or SCOP 
motif e • 

The new protein can find application in studying the expression 
15 profile of testis-specif ic genes- 



unknown protein 
20 perhaps complete cds- 
Sequenced by <3iagen 

Locus: /map="13b- c l cR from top of Chrl3 linkage group" 

25 

Insert length: SHbT bp 

Poly A stretch at pos- 544Ti polyadenylation signal at pos- 5420 



30 1 AAGGACAGAG GACGAGATTT T6AACGACAA AGAGAAAAGA GAGACAAGCC 

51 AAGGTCTACT TCCCCAGCAG GACAGCATCA TTCTCCTATA TCTTCTAGAC 

101 ATCACTCATC TTCCTCACAA TCAGGATCAT CTATTCAAAG ACATTCTCCT 

151 TCTCCTCGTC GAAAAAGAAC TCCTTCACCA TCTTATCAGC GGACACTAAC 

201 TCCACCTTTA CGACGCTCTG CCTCTCCTTA TCCTTCACAT TCTTTGTCGT 

35 251 CTCCCCAGAG AAAGCAGAGT CCTCCAAGAC ATCGCTCTCC AATGCGAGAG 

301 AAAGGGAGAC ATGATCATGA ACGAACTTCA CAGTCTCATG ATCGACGCCA 

351 CGAAAGGAGG GAAGATACTA GGGGCAAACG AGACAGAGAA AAGGACTCAA 

401 GAGAAGAACG AGAATATGAA CAGGATCAGA GCTCTTCTAG AGACCACAGA 

451 GATGACAGAG AACCTCGAGA TGGTCGGGAT CGGAGAGATG CGAGAGATAC 

40 5D1 TAGGGACCGA AGGGAACTAA GAGACTCCAG AGACATGCGG GACTCAAGGG 

551 AGATGAGAGA TTATAGCAGA GATACCAAA6 AGAGCCGTGA TCCCAGAGAT 

bOl TCTC66TCCA CTCGTGATGC CCATGACTAC AGGGACCGTG AAGGTCGAGA 

b51 TACTCATCGA AAGGAGGATA CATATCCAGA AGAATCCCGG AGTTATGGCC 

7D1 GAAACCATTT GAGAGAAGAA AGTTCTCGTA CGGAAATAAG GAATGAGTCC 

45 751 AGAAATGAGT CTCGAAGTGA AATTAGAAAT GACCGAATGG GCCGAAGTAG 

601 GGGGAGGGTT CCTGAGTTAC CTGAAAAGGG AAGTCGAGGC TCAAGAGGTT 

SSI CTCAAATTGA TAGTCACAGT AGTAATAGCA ACTATCATGA CAGCTGGGAA 

=101 ACTCGAAGTA GCTATCCTGA AAGAGATAGA TATCCTGAAA GAGACAACA6 

^51 AGATCAAGCA AGGGATTCTT CCTTTGAGAG AAGACATGGA GA6CGAGACC 

50 1001 GTCGTGACAA CAGAGAGAGA GATCAAAGAC CAAGCTCACC AATTCGACAT 

1051 CAGGGAAGGA ATGACGAGCT TGAGCGTGAT GAAAGAAGAG AGGAACGAAG 

1101 AGTAGACAGA GTGGATGATA GGAGAGATGA AAGGGCTAGA GAGAGAGATC 

1151 GGGAACGAGA ACGAGACAGG GAGCGGGAGA GAGAGAGGGA ACGTGAACGG 

1201 GATCGGGAAA GAGAAAAAGA GAGAGAACTA GAAAGAGAGC GTGCTAGGGA 

55 1251 ACGGGAGAGA GAAAGAGAAA AAGAGAGAGA TCGTGAAAGG GATAGAGACC 

13D1 GAGACCACGA TCGAGAGCGG GAAAGAGAGA GGGAACGAGA CAGGGAAAAA 

1351 GAACGGGAAC GAGAAAGAGA AGAGAGAGAG AGGGAGAGAG AGCGAGAACG 

1401 GGAGAGAGAG CGAGAGCGAG AACGGGAACG AGAAAGAGCG AGAGAAAGGG 
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mSl ATAAAGAACG AGAACGCCAA AGGGATTGGG AAGACAAAGA CAAAGGACGA 

1SQ1 GATGACCGCA GAGAAAAGCG AGAAGAGATC CGAGAAGATA GGAATCCAAG 

15S1 AGATGGACAT GATGAAAGAA AATCAAAGAA GCGCTATAGA AATGAAGGGA 

IbOl GTCCCAGCCC TAGACAGTCC CCGAAGCGCC GGCGTGAACA TTCTCC6GAC 

5 IbSl AGTGATGCCT ACAACAGTGG AGATGATAAA AATGAAAAAC ACAGACTCTT 

1701 GAGCCAAGTT GTACGACCTC AAGAATCTCG TTCTCTTAGT CCCTCGCACC 

1751 TCACAGAAGA CAGACAGGGT AGATGGAAAG AGGAGGATCG TAAACCAGAA 

IflOl AGGAAAGAGA GTTCAAGGCG CTACGAAGAA CAGGAACTCA AGGAGAAAGT 

IfiSl TTCTTCTGTA GATAAACAGA GAGAACAGAC AGAAATCCTG GAAAGCTCAA 

10 1101 GAATGCGTGC ACAGGACATT ATAGGACACC ACCAGTCTGA AGATCGAGAG 

1151 ACATCTGATC GAGCTCATGA TGAAAACAAG AAGAAAGCAA AAATTCAAAA 

2001 GAAACCAATT AAGAAAAAGA AAGAGGATGA TGTTGGAATA GAGAGGGGTA 

2051 ACATAGAGAC AACATCTGAA GATGGTCAAG TATTTTCACC AAAAAAAGGA 

21D1 CAGAAAAAGA AAAGCATTGA AAAAAAACGT AAAAAATCCA AAGGTGATTC 

15 2151 TGATATTTCT GATGAAGAAG CAGCCCAGCA AAGTAAGAAG AAAAGAGGCC 

2201 CACGGACTCC CCCTATAACA ACTAAAGAGG AATTGGTTGA AATGTGCAAT 

2251 GGTAAGAATG GTATTCTAGA GGACTCCCAG AAAAAAGAAG ATACAGCATT 

2301 CAGTGACTGG TCTGATGAGG ATGTCCCTGA CCGTACAGAG GTGACAGAAG 

2351 CAGAGCATAC TGCCACCGCC ACGACTCCTG GTAGTACCCC TTCTCCTCTA 

20 2M01 TCTTCTCTTC TTCCTCCTCC ACCGCCTGTG GCTACTGCCA CTGCTACAAC 

21151 TGTGCCTGCA ACTCTTGCTG CCACTACTGC TGCTGCCGCC ACCTCTTTCA 

25D1 GCACATCTGC CATCACTATT TCCACCTCTG CCACCCCCAC CAATACCACC 

2551 AATAATACTT TTGCCAATGA AGACTCACAC AGAAAATGCC ACAGAACACG 

2b01 AGTAGAAAAA GTAGAGACGC CTCACGTGAC TATAGAAGAT GCACAGCATC 

25 2b51 GCAAGCCTAT GGATCAAAAG AGGAGCAGCA GCCTCGGGAG CAATCGGAGT 

2701 AACCGTAGTC ATACGTCTGG TCGTCTTCGC TCCCCATCCA ATGATTCAGC 

2751 CCATCGAAGT GGAGATGACC AAAGTGGTCG AAAGAGAGTA CTGCACAGTG 

2301 GCTCAAGAGA TAGAGAAAAA ACAAAAA6CC TGGAAATCAC AGGAGAGAGA 

2351 AAATCTAGGA TTGATCAGTT AAAGCGTGGA GAACCCAGTC GAAGTACTTC 

30 21D1 TTCAGATCGC CAGGATTCAA GAAGCCATAG TTCAAGAAGA AGTTCTCCAG 

2151 AGTCAGATCG ACAGGTCCAT TCAAGATCTG GGTCATTTGA TAGCAGAGAC 

3001 AGGCTTCAAG AACGAGATCG ATATGAACAC GACAGAGAGC GCGAGAGAGA 

3051 GAGGAGAGAT ACGAGGCAGA GAGAATGGGA CCGAGATGCT GATAAAGATT 

3101 GGCCACGCAA CAGGGATCGA GATAGATTGC GAGAACGAGA ACGAGAGAGA 

35 3151 GAACGAGACA AAA6GAGAGA CTTGGATA6G GAAAGAGAGA GACTAATTTC 

3201 TGATTCTGTT GAAAGGGACA GGGACAGAGA CAGAGACAGA ACTTTTGAGA 

3251 GTTCTCAAAT AGAGTCTGTG AAACGCTGTG AAGGAAAACT GGAAGGTGAA 

33D1 CATGAAAGGG ATCTAGAAAG CACTTCCCGA GACTCTCTAG CCTTGGATAA 

3351 AGAGAGAATG GATAAAGATC TGGGATCTGT GCAGGGATTT GAAGATACAA 

40 3i»01 ATAAATCCGA GAGAACTGAG AGTCTGGAAG CAGGAGATGA CGAGTCCAAG 

3M51 TTAGATGATG CACATTCATT AGGCTCTGGT GCTGGAGAAG GATACGAGCC 

3501 AATCAGTGAT GACGAACTAG ATGAAATTCT GGCAGGTGAT GCAGAAAAGA 

3551 GGGAGGACCA ACAGGATGAG GAGAAGATGC CAGATCCCTT AGATGTGATA 

3b01 GATGTGGATT GGTCTGGTCT TATGCCAAAG CATCCAAAAG AACCACGAGA 

45 3b51 6CCTGGGGCT GCACTCTTAA AATTCACACC TGGAGCTGTT ATGCTAAGAG 

3701 TTGGGATTTC TAAAAAGTTG GCAGGTTCTG AACTCTTTGC CAAAGTCAAA 

3751 GAAACATGTC AGAGACTTTT AGAAAAACCC AAAGGTAGTT TCATTTTACT 

3601 TTAACTATAT AATGTCTGTT AACCATTTAA GATGCCATCT GAAGGGGATT 

3A51 CTGATCTGTT CTTATGTAGC ACTTAACACT 6TGTAGAAAC TATTTTTTGA 

50 3101 GAAATCATTT TATAATCATT ATTTAACCCT CATGGTCAAA GTTTCTCTTT 

3151 AAAATTTATT TTGAGAAGAA GAGTTATCCC ACAGAAAAGT TGGGAAAA6A 

H001 GTACAATGAC CTTTTTGTAT GAAAATTACT TATTAACAGG CCAG6CGTGG 

H051 TGTTGCATGT CTGTAGTCAC AGCTACTCAG GGA6GTT6AG GCAGCAG6AT 

M101 TGCTGGAGCC CAGGAAATTG AGGCTGCAGT GAGCCATGAT TGAGCCACCA 

55 H151 CACTCCAACC TAGGTGACAG A6CAAGACCC TGTCTCAAAA AAAAAAAAAC 

H201 AAATTAACCA ATAAGTTCTA ATATCAAAGT GCTCAGTGGT TTGCCCTTGG 

M251 CTAAATGAAG CAGAGCCAGG AAAAACAGAC TACATATTTT TCATGTCTAA 

M301 AGAAATTGGG TATTTTGGGA GCCCTTTCCC CTAGACATCT ACCCAAATGC 
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4351 AGGTGTGTAG GTTGAGTCTT TAACAAAGTG ATTAAGAGCT TGGTCTGTAA 

1*401 GGCCGGATGA TCTGGATTTC AGTAGGCACA CCACTTACTG GCTATTACTT 

4451 AATCTGTGTG TTAGTGTCAT CATCTGTAAG TCAGGAATAA TCATACCACC 

45D1 AACTTCCTAT GGTAATTAGG AGCAAATGAG TTATTACAGG CAAAACACTT 

5 4551 AGAACAGTTC CTGGCATATA GTAATACCCA ATAAATATTA ACTGCTACTT 

4L01 TGAAAATATC CTATCACGCT GATTTTTGAC CTCACTGCAG CAATTTTCAG 

4b51 TTATTCCAGA TTATCTAGCT TATGGATTCT GGTGGTAGGG GTTGTTTGGT 

4701 TTTGGTTTTC ACTGTCTCTG TCTCATCTAG TACCTACCTT AGTTTATTTT 

4751 GCAACTTACT AATACTTTAT TAATGGGGAG GGACGAGTAG ATGGTAAAAA 

10 4301 GAAGGAAAAG GAGGTAAAAG GTGAAAGGAA CAACATTAAT TAACAATTTT 

4A51 ACGTCATGTC CCTGGACATA AAAGTTTAGT TAGTATTAAA TTTTTCACTA 

4101 ATACAAAATA AAAAAATATT GTTTTATGAG TTTTATGAAT TCATGCCCTT 

4^51 CCTTTACTCT ATTAGCATAA GCAGTAAATT TTTTTATTTT AATATAGCCC 

5001 AATAAACCTA GAGTATACAT GTACAAAATA CATATAATTG TTAACGTGTA 

15 5051 TTAACCGAAA AATGACCCAA GACTTAGTTC TTGCCCTACT GTATCTGCCT 

5101 TGTTTGGTTG GTTCTGTGAC CTTAAGCAAA TAACTCCTGT GAGCCTCAAT 

5151 TTTATTTGTA AAGTGATGGA ATAAAACCCC TAAAATCTTA CCCACCTCTA 

5501 AAGATATTTG TTTCTGTGAC CTTTTGCTAG TAGCATTTCA AGTTAAAATC 

5251 TGGTTTGATT TTGCTACCCA TGAAATACAG TTCGGCCCTT ACTTATTGAT 

20 • 5301 GACTTAACCT AAACAGTGAA AATATGCACT GTAAAGGGTG GGGTGATGTG 

5351 GCTTAACAAT CAGACTTCTT CTATTTTTGC TGCTATGGTG GTTGTATTAG 

' • 5401 AGAACTGATG TATTATCTTG AATAAAGACT TTGTCTTGTT TACTGCCCTA 
5451 AAAAAAAAAA AAAAAAAAA 

25 . 

BLAST Results 



No BLAST result 

30 

Medline entries 



35 No Medline entry 



Peptide information for frame 1 

40 : 

ORF from 2*12 bp to 3B01 bpi peptide length: 1170 
Category: similarity to unknown protein 
Classification: no clue 

45 

1 MREKGRHDHE RTSflSHDRRH ERREDTRGKR DREKDSREER EYEflDiJSSSR 

51 DHRDDREPRD GRDRRBARDT RDRRELRDSR DMRDSREMRD YSRDTKESRD 

101 PRDSRSTRDA HDYRDREGRD THRKEDTYPE ESRSYGRNHL REESSRTEIR 

151 NESRNESRSE IRNDRMGRSR GRVPELPEKG SRGSRGSfllD SHSSNSNYHD 

50 SOI SUETRSSYPE RDRYPERDNR DflARDSSFER RHGERDRRDN RERDflRPSSP 

251 IRW2GRNDEL ERDERREERR VDRVDDRRDE RARERDRERE RDRERERERE 

301 RERDREREKE RELERERARE REREREKERD RERDRDRDHD RERERERERD 

351 REKERERERE ERERERERER ERERERERER ERARERDKER ERflRDUEDKD 

4D1 KGRDDRREKR EEIREDRNPR DGHDERKSKK RYRNEGSPSP RflSPKRRREH 

55 451 SPDSDAYNSG DDKNEKHRLL SdVVRPflESR SLSPSHLTED RflGRWKEEDR 

501 KPERKESSRR YEEUELKEKV SSVDK(2RE<2T EILESSRMRA flDIIGHHflSE 

551 DRETSDRAHD ENKKKAKIC3K KPIKKKKEDD VGIERGNIET TSEDGflVFSP 

bOl KKGfiKKKSIE KKRKKSKGDS DISDEEAAdfl SKKKRGPRTP PITTKEELVE 
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b51 I1CNGKNGILE 
701 SPLSSLLPPP 
751 NTTNNTFANE 
fiOl NRSNRSHTSG 
651 GERKSRIDtJL 
101 SRDRLflERDR 
=151 RERERDKRRD 
1001 EGEHERDLES 
1051 ESKLDDAHSL 
1101 DVIDVDUSGL 
1151 KVKETCflRLL 



DSflKKEDTAF 
PPVATATATT 
DSHRKCHRTR 
RLRSPSNDSA 
KRGEPSRSTS 
YEHDRERERE 
LDRERERLIS 
TSRDSLALDK 
GSGAGEGYEP 
MPKHPKEPRE 
EKPKGSFILL 



SDUSDEDVPD 
VPATLAATTA 
VEKVETPHVT 
HRSGDDI3SGR 
SDRflBSRSHS 
RRDTRflREbJD 
DSVERDRDRD 
ERMDKDLGSV 
ISDDELDEIL 
PGAALLKFTP 



RTEVTEAEHT 
AAATSFSTSA 
IEDAflHRKPM 
KRVLHSGSRD 
SRRSSPESDR 
RDADKDUPRN 
RDRTFESStJI 
CGFEDTNKSE 
AGDAEKREDtJ 
GAVMLRVGIS 
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ATATTPGSTP 
ITISTSATPT 
DflKRSSSLGS 
REKTKSLEIT 
CVHSRSGSFD 
RDRDRLRERE 
ESVKRCEAKL 
RTESLEAGDD 
flDEEKIIPDPL 
KKLAGSELFA 



15 



20 



25 



BLASTP hits 

No BLASTP hits available 

Alert BLASTP hits for DKFZphtes3_12dia-, frame 1 
No Alert BLASTP hits found 

Pedant information for DKFZphtes3_12dia-. frame 1 

Report for DKFZphtes3_12dia.l 



ELENGTHJ 12L.7 

30 EMIO 150513- 15 

CpIJ 1-52 

CH0M0L1 TREnBL:AB020fc.bO_l gene: "KIAA0flS3 n i product: 

n KIAA0fl53 protein";, Homo sapiens mRNA for KIAA0BS3 proteini 

partial cds- 0-0 

35 CBL0CKS1 BL00M22C Granins proteins 

CBL0C<S3 BL00aO3F 

EBLOCKSJ PR0030fiC 

EBLOCKSJ PRDlDfllB 

EBLOCKSJ PR000M1D 

40 CBL0CKS3 PR01083A 

EBLOCKSJ PRD05MSA 

EBLOCKS]] BLDOOMa Protamine PI . proteins 

EBLOCKSJ PFQllMOD 

EBLOCKSJ PRD0a33H 

45 EKIO All_Alpha 

EKU3 LOW_COMPLEXITY MM • 12 V. 



SE<3 KDRGRDFERlJREKRDKPRSTSPAGflHHSPISSRHHSSSSflSGSSIflRHSPSPRRKRTPSP 

50 SEG xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx 

PRD ccccchhhhhhhhccccccccccccccccccccccccccccccceeeccccccccccccc 

SEA SY(3RTLTPPLRRSASPYPSHSLSSP(2RKi2SPPRHRSPMREKGRHDHERTS(JSHDRRHERR 

SEG x xxxxxxxxxxxxx xxxxxxxx 

55 PRD ccccccccccccccccccccccccccccccccccccccccccccccccccccchhhhhhc 

SEC EDTRGKRDREKDSREEREYEflDflSSSRDHRDDREPRDGRDRRDARDTRDRRELRDSRDMR 

SEG XX • xxxxxxxxxxxxxxxxx • xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx 
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PRD cccccccccccchhhhhhhhhhccccccccccccccccccchhhhhhhhhhhhhhhcccc 



SEfl DSREMRDYSRDTKESRDPRDSRSTRDAHDYRDREGRDTHRKEDTYPEESRSYGRNHLREE 

SEG xxxxxxxxxxx- • • xxxxxxxxxxxx 

5 PRD hhhhhhhccccccccccccccccccccccccccccccccccccccccccccccccccccc 

SEfl SSRTEIRNESRNESRSEIRNDRMGRSRGRVPELPEKGSR6SRGSt3IDSHSSNSNYHD'SUE 

SEG xxxxxxxxxxxx- - 

PRD hhhhhhhccccccccccccccccccccccccccccccccccccccccccccccccccccc 

10 

SEfl TRSSYPERDRYPERDNRDflARDSSFERRHGERDRRDNRERDflRPSSPIRHflGRNDELERD 

SEG xxxxxxxxxxxxxxxxxx xxxxxx 

PRD cccccccccccccccccccccccccccccccccccccccccccccccccccccchhhhhh 

15 SEfl ERREERRVDRVDDRRDERARERDRERERDRERERERERERDREREKERELERERARERER 

SEG xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx 

PRD hhhhhhhhhcchhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhh 

SEfl EREKERDRERDRDRDHDRERERERERDREKEREREREERERERERERERERERERERERA 

20 SEG xxxxxx xxxxxxxxxxxx XXXXXXXXXXX XXX XX xxxxxxxxxxxxxxxxxxxxxxxxxx 

PRD hhhhhhhhhcccccccchhhhhhhhhhhhhhhhhhhhhhhhhhhh 

SEfl RERDKERERflRDUEDKDKGRDDRREKREEIREDRNPRDGHDERKSKKRYRNEGSPSPRflS 

SEG XXX XXXXX X - XXXXXXXX XXXXXXXXXXX XX XXX xxxxxxx 

25 PRD hhhhhhhhhhhhhhhccccccchhhhhhhhhhccccccccccchhhhhhccccccccccc 

SEfl PKRRREHSPDSDAYNSGDDKNEKHRLLSflVVRPflESRSLSPSHLTEDRflGRUKEEDRKPE 

SEG xxxxx 

PRD ccccccccccccccccccccchhhhhhhhhcccccccccccccccchhhhhhhhhhccch 

30 

SEfl RKESSRRYEEflELKEKVSSVDKflREflTEILESSRPIRAflDIIGHHflSEDRETSDRAHDENK 

SEG x 

PRD hhhhhhhhhhhhhhhhhhccchhhhhhhhhhhhhhhhhheeeeccccccc 

35 SEfl KKAKIflKKPIKKKKEDDVGIERGNIETTSEDGflVFSPKKGflKKKSIEKKRKKSKGDSDIS 

SEG xxxxxxxxxxxxxx xxxxxxxxxxxxxxxxxxxxxxx 

PRD hhhhhhhhccccccccccccccccceeecccceeecccccchhhhhhhhhhccccccccc 

SEfl DEEAAflflSKKKRGPRTPPITTKEELVEriCNGKNGILEDSflKKEDTAFSDlilSDEDVPDRTE 

40 SEG xxx xx 

PRD hhhhhhhhhhcccccccccccchhhhhhccccccGGecccccccccccccccccccceee 

SEfl VTEAEHTATATTPGSTPSPLSSLLPPPPPVATATATTVPATLAATTAAAATSFSTSAITI 

SEG xxxxxx xxxxxxxxxxxxxxxxxxxxxxx xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx 

45 PRD hhhhhhhhccccccccccceeeccccccceeeeeeecccchhhhhhhhhhhhccccceeG 

SEfl STSATPTNTTNNTFANEDSHRKCHRTRVEKVETPHVTIEDAflHRKPMDflKRSSSLGSNRS 

SEG xxxxxxxxxxxxxxxx xxxxxxxxxx 

PRD eccccccccccccccccccccchhhhhegeGcccGGeecccccccccccccccccccccc 

50 

SEfl NRSHTSGRLRSPSNDSAHRSGDDflSGRKRVLHSGSRDREKTKSLEITGERKSRIDflLKRG 

SEG xxx 

PRD ccccccccccccccccccccccccccceeGGGCCcccccccGeGeehhhhhhhhhhhhcc 

55 SEfl EPSRSTSSDRflDSRSHSSRRSSPESDRflVHSRSGSFDSRDRLflERDRYEHDRERERERRD 

SEG - - XXXXXXXXXXXXXXXXX XXXXXXXX XX xxx xxxxxxx xxxxx 

PRD cccccccccccccccccccccccccccGGGGCccccccchhhhhhhhhhchhhhhhhhhh 
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SE(2 TRtJREliJDRDADKDliJPRNRDRDRLRERERERERDKRRDLDRERERLISDSVERDRDRDRDR 

SEG xxxxxxxxx xxxxxxxxxxxxxxxxxxxxxxxxxxxx xxxxxxxxx 

PRD hhhhhhhhccccccccccchhhhhhhhhhhhhhhhhhhhhhhhhhhhhccccccccccce 

SEfl TFESSlHESVKRCEAKLEGEHERDLESTSRDSLALDKERMDKDLGSViJGFEDTNKSERTE 

SEG 

PRD eechhhhhhhhhhhhhhhhhhccccccccccchhhhhhhhhhcccccccccccccccccc 

SE<3 SLEAGDDESKLDDAHSLGSGAGEGYEPISDDELDEILAGD AEKREDflflDEEKUPDPLDVI 

SEG 

PRD cccccccccccccccccccccccccccccccccceeeecchhhhhhhhhhhcccccceee 

SEC DVDWSGLMPKHPKEPREPGAALLKFTPGAVMLRVGISKKLAGSELFAKVKETCiJRLLEKP 

SEG 

PRI> eccccccccccccccccccceeeeeccceeeeeecccccccchhhhhhhhhhhhhhhhcc 

SEA KGSFILL 

SEG 

PRD. ccccccc 



(No Prosite data available for DKFZphtes3_12dia.l) 
(No Pfara data available for DKFZphtes3_15dlfl.l) 
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PCT71B01/02050 



5 group: testis derived 

D<FZphtes3_mi7 encodes a novel fllS amino acid protein without 
similarity to known proteins. 

10 The mRNA is transcribed ubiquitously. 

No informative BLAST results^ No predictive prosite-i pfam or SCOP 
motif e • 

The new protein can find application in studying the expression 
15 profile of testis-specif ic genes- 



similarity to C-elegans BD11E.3 

20 see also DKFZphtes3_17n3 
perhaps complete cds- 

Sequenced by BMFZ 

25 Locus: unknown 

Insert length: 3555 bp 

Poly A stretch at pos- 315fcn polyadenylation signal at pos- 3137 

30 

1 AACACATCGA CTTGTGTAAG AAAAAGATTG GAAGTGCGGA GCTGTCTTTT 

51 GAGCATGATG CATGGATGTC TAAACAATTC CAGGCCTTTG GAGATTTATT 

101 TGATGAAGCT ATTAAGTTAG GGTTAACAGC TATTCAAACT CAGAATCCTG 

151 GTTTCTATTA CCAGCAGGCA GCATACTATG CCCAGGAGCG GAAACAGCTT 

35 5D1 GCAAAAACCC TCTGTAACCA CGAAGCTTCT GTAA'TGTATC CCAATCCTGA 

551 TCCCTTAGAA ACACAAACAG GCGTTCTTGA CTTTTATGGA CAAAGATCAT 

301 GGCGACAAGG AATACTAAGT TTTGATCTTT CTGATCCT6A AAAAGAAAAG 

351 GTGGGAATTC TTGCCATTCA GCTGAAGGAG AGAAATGTTG TTCACTCTGA 

101 GATAATCATA ACTCTTCTGA GCAATGCT6T TGCACAGTTC AAGAAGTATA 

40 M51 AGTGCCCGCG AATGAAAAGT CACCTAATGG TTCAGATGGG AGAGGAATAT 

501 TATTACGCAA AGGATTATAC CAAAGCTTTG AAGTTGCTGG ATTATGTGAT 

551 GTGTGATTAT CGGAGTGAAG GATGGTGGAC TCTGCTCACT TCTGTATTAA 

fc.01 CTACAGCTCT GAAGTGCTCC TACCTCATGG CCCAATTAAA GGATTACATT 

bSl ACTTACTCCC TAGAACTCCT TGGTAGAGCT TCAACTCTGA AAGATGACCA 

45 701 GAAGTCTCGG ATAGAAAAGA ACCTCATAAA TGTTTTAATG AATGAAAGTC 

751 CTGATCCAGA ACCCGACTGT GATATCTTAG CTGTGAAAAC TGCTCAGAAG 

BOl CTGTGGGCAG ACCGAATTTC TCTGGCTGGC AGCAATATTT TCACAATAGG 

fl51 AGTACAGGAC TTTGTGCCAT TTGTGCAGTG CAAAGCCAAG TTTCATGCCC 

=101 CAAGTTTTCA TGTTGATGTT CCTGTTCAGT TTGATATTTA TCTGAAGGCT 

50 151 GATTGTCCAC ATCCCATTAG GTTTTCCAAG CTCTGTGTCA GCTTTAATAA 

1001 TCAGGAATAC AACCAGTTCT GTGTAATAGA AGAAGCATCC AAAGCAAATG 

1051 AAGTTTTAGA AAATCTGACT CAAGGAAAGA TGTGCCTAGT TCCTGGCAAA 

1101 ACAAGAAAAC TGTTATTTAA GTTTGTTGCA AAAACTGAAG ATGTGGGAAA 

1151 GAAAATTGAG ATTACTTCAG TGGATCTTGC TCTGGGCAAT GAGACGGGAA 

55 1201 GATGTGTGGT TTTAAATTGG CAGGGAGGAG GAGGAGATGC TGCTTCCTCC 

1551 CAAGAAGCCT TACAGGCAGC TCGGTCTTTC AAAAGGCGAC CTAAGCTACC 

13D1 TGACAATGAA GTTCACTGGG ACAGCATTAT AATTCAGGCA AGCACAATGA 

1351 TCATATCCAG AGTCCCAAAC ATTTCTGTAC ATCTGCTACA TGAACCCCCT 
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mOl GCACTGACTA ATGAAATGTA TTGTTTGGTT GTGACTGTTC AGTCCCATGA 

mSl AAAGACCCAA ATCAGAGATG TGAAGCTCAC TGCTGGCTTA AAACCAGGAC 

1501 AGGATGCCAA TTTAACTCAG AAGACTCACG TGACTCTTCA TGGACCAGAA 

1SS1 CTGTGTGATG AATCCTACCC GGCTTTACTC ACTGACATTC CTGTTGGAGA 

5 lbOl CTTACATCCA GGGGAACAGC TGGAAAAAAT GTTGTATGTT CGCTGTGGAA 

IbSl CAGTGGGTTC CAGAATGTTT CTTGTATATG TTTCTTACCT GATAAATACA 

17D1 ACCGTTGAAG AAAAAGAAAT TGTTTGCAAG TGTCACAAGG ATGAAACTGT 

17S1 AACAATTGAA ACAGTCTTTC CATTTGATGT TGCGGTTAAA TTTGTTTCTA 

IflOl CCAAGTTTGA GCACCTGGAA A6GGTTTATG CTGACATCCC CTTTCTGTTG 

10 IfiSl AT6ACGGACC TCTTAAGTGC CTCACCCTGG GCCCTCACTA TTGTTTCCAG 

nOl TGAGCTCCAG CTTGCTCCAT CCATGACCAC AGTGGACCAG CTCGAGTCTC 

IISI AAGTGGACAA TGTTATCTTA CAGACTGGAG AGAGTGCTAG TGAATGCTTT 

2001 TGTCTTCAAT GCCCATCTCT TGGAAATATT GAAGGTGGAG TAGCAACCGG 

2051 GCATTATATT ATCTCTTGGA AAAGGACCTC AGCAATGGAG AATATCCCCA 

15 51D1 TCATCACAAC TGTCATCACT CTGCCGCACG TGATTGTGGA GAATATCCCT 

2151 CTCCATGTGA ATGCAGATCT GCCGTCATTT GGGCGTGTCA GAGAGTCGTT 

2201 ACCTGTCAAG TATCACCTAC AGAATAAGAC CGACTTAGTT CAAGATGTAG 

22S1 AAATTTCTGT GGAGCCCAGT GATGCCTTCA TGTTCTCAGG TCTCAAACAG 

2301 ATTCGATTAC GTATCCTCCC TGGCACGGAG CAGGAAAT6C TATATAATTT 

20 23S1 CTATCCTCTG ATGGCTGGAT ACCAGCAGCT GCCATCTCTC AACATCAACT 

2i»01 TGCTTAGATT TCCTAACTTC ACAAATCAGC TGCTCAGGCG TTTTATACCT 

2151 ACCAGTATTT TTGTCAAGCC ACAGGGTCGA CTCATGGATG ATACCTCTAT 

2S01 TGCTGCTGCA TGATGTTCAA GACCGGCCCT TGGCTGTTGT TACAGAGATG 

2SS1 TTGGGCAGAG CTATGCAGGT GTTTCATTGT GAACTCTAGC TTTGATGATG 

25 2b01 GTAAAAAGTT AACCTTTTCT ATTTTTTAAT GGATGTTATA CCAACTATTC 

2bSl AGAGGAACTC ATACTTCAAA AATATTAGGA AAATCTGTCT TATAGTTTCT 

2701 CTAATAAATA TCTGAAATCT CAGTACGACA TGAAAGAATG TCAGACCATT 

2751 GTTATTGTTG AAAGTCATTT GATGAATGGT AAATTCTATG AAAAGTAAGT 

2A01 GATTTGCATG TATAATATCA GGAAAATTAA GCATCCCAAG TGTGACTGGA 

30 2fiSl CAAAGAGAGC AGATGCACCA GTGCCTGTGC CATAAAGTTC CGAATCCCCC 

• 2101 ATGTGTCTCT TTCAGAGCTG GCCAGACCGG AAATAAATCA TTCTCATAAA 

2^51 TTCAGTGTGT ACTCAGAACA GATACACAAC AACATAGGGA GTTGTATGAC 

3001 TGATACGGAA AACTTCCAGA AAGTTTTAAT CAAAGCAGTT TAATTAAGGT 

3051 ATCAAAAATA TCTTTGCTTA CTATCAAGAA GTGTCAAATA GGTTCAGCTT 

35 3101 GCTGCCAAAA TATGGATCAT TTATGAAGCA GGTTCATATT TTAGAGGTGT 

3151 TAATAAAATC CTCATGGGAA AAGATCCAAA GTGCAAGGAT TTGATTATAA 

3201 ACATAATTTC CTAGACTGAA AGTTTTTGGA AAAGATGCAG GGTCTGAGTC 

3251 AGGCCTTCTG GTTATATTGT GCAGTTTCAA AAGAACTATT TAAAACTCTT 

3301 GAAAACTCAT GTAAATAAAA ATCATAGGGT GAAAATTGTA TTTGTTAAAA 

40 3351 TACCTTAATA ATTTAAAATG ACCTGATTTC CTGGAAAATT TTATTATTCA 

3H01 AAAGGTGGAG GCATTGTAAA AAGGAAATAG TGATGTAAAT AAACATGTTC 

3M51 TCTTTCAAAA AAAAAAAAAA AAAAAAAAAA AAAAAAAAAA AAAAAAAAAA 

3501 AAAAAAAAAA AAAAAAAAAA AA 

45 

BLAST Results 



No BLAST result 

50 

Medline entries 



55 No lledline entry 
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Peptide information for frame 3 



ORF from hh bp to 2510 bp, peptide length: A15 
5 Category: similarity to unknown protein 
Classification: no clue 

1 CISKflFflAFGD LFDEAIKLGL TAICJTflNPGF YYfiflAAYYAfl ERKC2LAKTLC 

51 NHEASVMYPN PDPLETflTGV LDFYGflRSUR fiGILSFDLSD PEKEKVGILA 

10 101 K2LKERNVVH SEIIITLLSN AVAflFKKYKC PRtlKSHLUVfl MGEEYYYAKD 

151 YTKALKLLDY VllCDYRSEGIi) WTLLTSVLTT ALKCSYLIIAfi LKDYITYSLE 

201 LLGRASTLKD DC5KSRIEKNL INVLIINESPD PEPDCDILAV KTAfiKLIi) ADR 

ESI ISLAGSNIFT IGVtJDFVPFV <2CKAKFHAPS FHVDVPVflFD IYLKADCPHP 

301 IRFSKLCVSF NN(3EYN(3FCV IEEASKANEV LENLTfiGKIlC LVPGKTRKLL 

15 351 FKFVAKTEDV GKKIEITSVD LALGNETGRC VVLNUdGGGG DAASSflEALfl 

M01 AARSFKRRPK LPDNEVHUDS III<3 ASTMII SRVPNISVHL LHEPPALTNE 

M51 MYCLVVTVflS HEKTfilRDVK LTAGLKPGflD ANLTflKTHVT LHGPELCDES 

: 501 YPALLTDIPV GDLHPGEcJLE KdLYVRCGTV GSRMFLVYVS YLINTTVEEK 

: 551 EIVCKCHKDE TVTIETVFPF DVAVKFVSTK FEHLERVYAD IPFLLI1TDLL 

20 bOl SASPWALTIV SSELflLAPSM TTVDfiLESflV DNVILlJTGES ASECFCLfiCP 

t51 SLGNIEGGVA TGHYIISUKR TSAMENIPII TTVITLPHVI VENIPLHVNA 

701 DLPSFGRVRE SLPVKYHL(2N KTDLVtJDVEI SVEPSDAFMF SGLKCIRLRI 

751 LPGTEflEMLY NFYPLMAGYfl (2LPSLNINLL RFPNFTNflLL RRFIPTSIFV 

fiOl KPflGRLNDDT SIAAA 

25 



BLASTP hits 

30 No BLASTP hits available 

Alert BLASTP hits for DKFZphtes3_mi7-, frame 3 
No Alert BLASTP hits found 

Pedant information for DKFZphtes3_mi7i frame 3 



35 



40 



Report for DKFZphtes3_im?.3 



CLENGTH1 fl3b 
EMU! ^24=1.30 

ipii 5.34 

45 EHOIIOL]! TREMBL:CEUB0412_2 gene: n B0MlE.3"; Caenorhabdi tis 

elegans cosmid B0M12. be-30 
EKUl ' Alpha_Beta 
IKW1 L0U_C0nPLEXITY 1-20 '4 

50 

SEfl HIDLCKKKIGSAELSFEHDAUnSKflFi2AFGDLFDEAIKLGLTAI<2T<3NPGFYYf2flAAYYA 

SEG xxxxxxxxx 

PRD ccceeeeeehhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhccceeeccccchhhhhhhhh 

55 SEfl (2ERK<2LAKTLCNHEASVnYPNPDPLET<3TGVLDFYGt2RShlR(2GILSFDLSDPEKEKVGIL 

SEG x 

PRD hhhhhhhhhhhhhcceeeccccccccceeeeeeeeccccceeeceeeeeccchhhhhhhh 
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SE<2 AI(2LKERNVVHSEIIITLLSNAVA(3FK<YKCPRnKSHLf1V(3nGEEYYYAKDYTKALKLLD 

SEG 

PRD hhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhccceeehhhhhhhhhhhh 

5 SE<2 YVnCDYRSEGUUTLLTSVLTTALKCSYLIIAflLKDYITYSLELLGRASTLKDDfllCSRIEKN 

SEG 

PRD hhhhccccccceeehhhhhhhhhhhhhhhhhhhhhhhhhhhhhhchhhhhccccchhhhh 

SE(2 LINVLNNESPDPEPDCDILAVKTAflKLWADRISLAGSNIFTIGVGDFVPFVflCKAKFHAP 

10 SEG 

PRD hheeeeccccccccccchhhhhhhhhhhhhhhhhhcccceeeeeeehhhhhhhhhhcccc 

SE<3 SFHVDVPVi2FDIYLKADCPHPIRFSiaCVSFNNl2EYNl2FCVIEEASKANEVLENLT(2GKM 

SEG 

15 PRD eeeeeeecceeeeeecccccceeeeeeeeecccccccceeeeeccccchhhhhccccccc 

SE<3 CLVPGKTRKLLFKFVAKTEDVGKKIEITSVDLALGNETGRCVVLNUUJGGGGDAASSflEAL 

SEG 

PRD cccccccccchhhhhhhhhccceeeeeeeecccccccccceeeeeecccccccchhhhhh 

20 

SE(3 flAARSFKRRPKLPDNEVHUDSIIiaASTnilSRVPNISVHLLHEPPALTNEIIYCLVVTVlJ: 

SEG 

PRD hhhhhhhhcccccccccccceeeeeeeceeeeccccceeeeeccccccccceeeeeeeee 

25 SE(2 SHEKT<2IRDVKLTAGLKPG«2DANLT(2KTHVTLHGPELCDESYPALLTDIPVGDLHPGE<2L 

SEG 

PRD cccccceeeeeccccccccchhhhhheeeeecccccccccceeeeecccccccccccchh 

SEfl EKMLYVRCGTVGSRMFLVYVSYLINTTVEEKEIVCKCHKDETVTIETVFPFDVAVKFVST 

30 SEG 

PRD hhhhhhcccccchhhhhcchhhhhccccceeeeeeecccccceeeeeeccceeeeeeeeh 

SE<2 KFEHLERVYADIPFLLI1TDLLSASPUALTIVSSEL(2LAPSriTTVD(3LES(3VDNVILl3TGE 

SEG 

35 PRD hhhhhhhhhhccceeeehhhhhccccceeehhhhhhhhccceeeccccccccceeeeccc 

SE(2 SASECFCLdJCPSLGNIEGGVATGHYIISlilKRTSAIIENIPIITTVITLPHVIVENIPLHVN 

SEG 

PRD cceeeeeeeecccccccccccceeeeeeeeccccccccceeeeeeeeeeeeeeecccccc 

40 

SEfl ADLPSFGRVRESLPVKYHLdNKTDLVflDVEISVEPSDAFUFSGLKfllRLRILPGTECJEIIL 

SEG 

PRD cccccccceeeccceeeeeecccccceeeeeecccccceeeccccccceeeccccccccc 

45 SE(3 YNFYPLriAGY(J(2LPSLNINLLRFPNFTN(2LLRRFIPTSIFVKPlJGRLriDDTSIAAA 

SEG 

PRD cccccccccccccccccccccccccccchhhhhcccceeeeecccccccccccccc 

50 (No Prosite data available for DKFZphtes3_mi?-3) 

(No Pfam data available for DKFZphtes3_1417 - 3) 
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5 group: testis derived 

DKFZphtes3_l£nm encodes a novel 713 amino acid protein with weak 
similarity to the neurofilament triplet PI protein of the rat- 
io Neurofilaments are the intermediate filaments specific to nervous 
tissue. They are probably essential to the tensile strength of 
the neuroni as well as to transport of molecules and organelles 
within the axon- Until now-i ESTs of the novel mRNA could only be 
isolated from testes-i germ cells and uterus. 
15 No informative BLAST resultsi No predictive prositei pfam or SCOP 
motife- 

The new protein can find application in studying the expression 
profile of testis-specif ic genes- 

20 

similarity to neurofilament triplet M protein - rat 

few EST hits (b of =1 hits from testis) 
25 perhaps complete cds- 

Sequenced by GBF 

Locus: unknown 

30 

Insert length: 235=1 bp 

Poly A stretch at pos- 232fi-> polyadenylation signal at pos- 230b 



35 1 TGGGCCCCAC 

51 AGCCCTGCGG 

101 CAGTGAGTGA 

151 GAGGAGCGGG 

201 AAGCTGGGAG 

40 251 AGATGACAGG 

301 GAGCCCATCA 

351 ATTACCAGCC 

HQ1 TGTTTCTGAT 

MSI GATTTCAGCC 

45 501 GCCCTTCTCG 

551 AGGGAAGCTC 

b01 TCTGATGTCT 

b51 GCCAGCTTGC 

701 GGATTGCTGC 

50 751 TCAGAACTGA 

flOl GCGACGGCAG 

851 TGCAGCGATT 

=101 ATTAAAACAT 

=151 GGAATTTTGG 

55 1001 TGCAGGTCAA 

1051 GAGAGGAAAG 

1101 CGTTCGCGAA 

1151 GCACACCATC 



CTCCTCAGCA CAACTTTCTG 
AAGAAGCAGC AGGAAGCCCT 
GCTGCTCATG CACACCGGGG 
AGCTCATTGA CTGCACACTT 
AACAGTGGGT TCTGGAGTCG 
TCTGGTCATG ACCAAGACAA 
CTCATATCAG GAAGCCCCAC 
CAGAGGGACG CTTCATACCG 
CTACCGACGC AAGGAGCTGC 
AGCAGGATAT TGATGGCCTG 
GCTGTTACTG TGGAAGACTA 
CTCTGAAGAC ACAACATACT 
CCATGCCTAT TCTCGGCCCT 
TGGATCAGAG GCAGTAATCC 
TCACTTGACC TTTGAAACCC 
CTGTGGTCAA TAATGGCACC 
CACCA6CCGG ACACTTTCCA 
TTACTTTGAC AACCGGGAAG 
TTACCTTCTT CTTCAAGTCT 
GAGTTTCGAA CCCATCCTAC 
TCTCCACGCG GTCTCCCTGA 
TACTGGAGAG CAAGCTGACT 
GTGCTGCAGG AGCTGCTGAT 
ACCTGTGGAT GCCTATCTCA 



AAAAACTGGC AGCGTAACAC 
CAGCGAACAC CTAAAGAAGC 
AGACCTACAG ACGGATCCAG 
CCAACCCGGC GTGATAGGAA 
ACTGGAATAC TTGGGAGATG 
AAACTCAGCG TGGCCTCATG 
TCCATCCGGG TGGAGACAGG 
CTACACCTGG GATCGGAGTC 
AGAGAATCAT GGAAGAGCTG 
GAGGTGGTGG GCAAAGGGTG 
CACAGTGTTT GAAAGAAGTC 
TAGGCACATT GGCCAGTTCC 
TCTCTGCT6T TCTGTGGGAA 
ACAGGACAAG AGGCAGGTTG 
TAGAAGGCGA GAAAACCTCC 
GTGGCCATTT GGTATGACTG 
AGACCTTAAG AAAAACAGGA 
GTGTGATTCT GCCTGGAGAA 
TTGACTGCTG GGGTCTTCAG 
TCTATTAGGA GGTGCTATAC 
CCCAGGACGT TTTTGAGGAT 
GCCCATGAGG CAGTCACCGT 
GGGGGTCTTG ACCCCGGAGC 
CCGAGGAAGA CTTGTTCCGG 
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1201 CACAGAAATC CTCCGCTGCA TTATGAGCAC CAAGTGGTGC AAAGCCTGCA 

1251 CCAACTGTGG CGCCAGTACA TGACCCTGCC CGCCAAGGCT GAGGAGGCCA 

13D1 GGCCAGGGGA CAAGGAGCAC GTCAGCCCCA TAGCCACAGA GAAGGCCTCT 

1351 GTGAATGCTG AGCTGTTACC ACGCTTTAGG AGCCCCATCT CCGAAACTCA 

5 H401 AGTGCCCCGG CCTGAGAACG AGGCCCTCAG GGAATCCGGG TCCCAGAAGG 

mSl CCAGAGTGGG GACCAAGAGT CCTCAGCGGA AGAGCATCAT GGAGGAGATC 

1SD1 CTGGTGGAGG AAAGCCCAGA TGTGGACAGC ACCAAGAGCC CCTGGGAGCC 

1SS1 GGATGGCCTT CCCCTGCTGG AGTGGAACCT CTGCTTGGAG GACTTCAGAA 

IbOl AGGCAGTGAT GGTGCTCCCT GATGAGAACC ACAGAGAGGA TGCGTTGATG 

10 IbSl AGGCTCAACA AAGCAGCCCT GGAGCTGTGC CAGAAGCCAA GGCCATTGCA 

1701 GTCCAACCTC CTGCACCAGA TGTGTTTGCA GCTGTGGCGA GATGTGATTG 

17S1 ACAGCCTGGT GGGCCATTCC ATGTG6CTGA GGTCTGTGCT GGGCCTGCCT 

IflOl GAGAAGGAGA CCATCTATTT GAATGTGCCT GAAGAGCAAG ATCAAAAATC 

IflSl ACCTCCTATC ATGGAAGTGA AGGTACCTGT GGGGAAAGCT GGGAA6GAGG 

15 1101 AGCGGAAAGG AGCAGCCCAG GAAAAGAAGC AACTGGGGAT CAAAGACAAA 

1=151 GAAGACAAGA AAGGAGCCAA GCTGCTCGGG AAAGAGGACC GTCCCAACAG 

2001 CAAGAAGCAC AAGGCAAAGG ATGACAAGAA AGTCATAAAA TCTGCAAGTC 

HDS1 AGGACAGGTT TTCTTTGGAA GACCCTACCC CTGACATCAT CCTCTCTTCT 

2101 CAAGAACCCA TAGACCCCCT GGTCATGGGG AAATACACCC AGAGGCTGCA 

20 2151 CAGTGAGGTC CGTGGGCTGC TGGACACCCT GGTGACCGAC CTGATGGTCC 

2201 TGGCTGATGA GCTCAGCCCC ATAAAGAATG TCGAGGAGGC TTTGC6CCTC 

2251 TGCAGGTGAC TCTCGGGCCC AAGCAACCTT CTGGAAAACG GGTTAATAAA 

2301 TAAATCAATA AAGAACCTTC AAGTTTCTAC TAAAAAAAAA AAAAAAAAAA 

B351 AAAAAAAAAA AAAAAAAAAA AAAAAAAAAA GGGCGGCCG 

25 

BLAST Results 

30 No BLAST result 



Iledline entries 



35 

No Medline entry 



40 Peptide information for frame 1 



ORF from 118 bp to 225b bp} peptide length: 713 
Category: putative protein 
45 Classification: Cell structure/motility 

1 MHTGETYRRI (2EERELIDCT LPTRRDRKSli) ENSGFUSRLE YLGDENTGLV 

51 HTKTKTCRGL MEPITHIRKP HSIRVETGLP AdRDASYRYT UDRSLFLIYR 

101 RKELdRIIIEE LDFSdfiDIDG LEVVGKGUPF SAVTVEDYTV FERSflGSSSE 

50 151 DTTYLGTLAS SSDVSHPILG PSLLFCGKPA ClilIRGSNPflD KRflVGIAAHL 

B01 TFETLEGEKT SSELTVVNNG TVAIUYDURR (SHfiPDTFflDL KKNRMflRFYF 

251 DNREGVILPG EIKTFTFFFK SLTAGVFREF UEFRTHPTLL GGAIL(2VNLH 

301 AVSLTflDVFE DERKVLESKL TAHEAVTVVR EVLfJELLPIGV LTPERTPSPV 

351 DAYLTEEDLF RHRNPPLHYE H<2VV<2SLH<3L URdYMTLPAK AEEARPGDKE 

55 M01 HVSPIATEKA SVNAELLPRF RSPISETflVP RPENEALRES GS<2KARVGTK 

HS1 SPflRKSIIIEE ILVEESPDVD STKSPUEPDG LPLLEUNLCL EDFRKAVMVL 

501 PDENHREDAL MRLNKA ALEL CflKPRPLflSN LLHflMCLdJLU RDVIDSLVGH 

551 SflULRSVLGL PEKETIYLNV PEE(2D(2KSPP IMEVKVPVGK A6KEERKGAA 
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bDl <3EKK(2LGIKD KEDKKGAKLL GKEDRPNSKK HKAKDDKKVI KSAStJDRFSL 
LSI EDPTPDIILS SflEPIDPLVM GKYTdJRLHSE VRGLLDTLVT DLHVLADELS 
701 PIKNVEEALR LCR 



BLASTP hits 

No BLASTP hits available 

Alert BLASTP hits for DKFZphtes3_lSnm-i frame 1 
Mo Alert BLASTP hits found 
15 Pedant information for DKFZphtes3_15nm-. frame 1 



10 
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Report for DKFZphtes3_15nm -1 



[LENGTH] 713 
Ell III J A17fl0-S3 
Cpll t •□□ 

[BLOCKS] PF0Dfl7flC 
25 [BLOCKS! BLDDblDC DEAH-box subfamily ATP-dependent helicases 
proteins 

[KlilD Alpha_Beta 

[KIO L0U_C0HPLEXITY M-07 V. 

30 

SE<2 MHTGETYRRI(3EERELIDCTLPTRRDRKSUENSGFIi)SRLEYLGDEI1TGLVI1TKTKT(2RGL 

SEG 

PRD ccchhhhhhhhhhhhhhhhccccchhhhhhccccccceeeeccccceeeeeecccccccc 

35 SE<3 MEPITHIRKPHSIRVETGLPAflRDASYRYTWDRSLFLIYRRKELflRIMEELDFSCJfiDIDG 

SEG 

PRD cccccccccccceeeeeccccchhhhhhhcccchhhhhhhhhhhhhhhhhhcccccccce 

SE(J LEVVGKGIilPFSAVTVEDYTVFERSdGSSSEDTTYLGTLASSSDVSIIPILGPSLLFCGKPA 

40 SEG 

PRD eeeeeccccceeeeecceeeeeecccccccceeecccccccccccccccccceeeecccc 

SEd CUIRGSNP<2DKR(2VGIAAHLTFETLEGEKTSSELTVVNNGTVAIUYDURR<2H(2PDTFCiDL 

SEG 

45 PRD eeeeccccccchhhhhhhhhheeecccccccceeeeecccceeeeehhhhhccccchhhh 

SEC! KKNRMflRFYFDNREGVILPGEIKTFTFFFKSLTAGVFREFLIEFRTHPTLLGGAILflVNLH 

SEG 

PRD hhhhhhhhhcccccccccccceeeeeeeehhhhhhhhhhhhhhhcccccccchhhhhhhh 

50 

SE(2 AVSLTflDVFEDERKVLESKLTAHEAVTVVREVLflELLMGVLTPERTPSPVDAYLTEEDLF 

SEG 

PRD hhhhhhhccchhhhhhhhhhhhhhhhhhhhhhhhhhhhccccccccccccceeeeccccc 

55 SE<2 RHRNPPLHYEH(3VV(3SLH(JLIilR(3YflTLPAKAEEARPGDKEHVSPIATEKASVNAELLPRF 

SEG 

PRD cccccccccccchhhhhhhhhhhhhhhhhhhhhccccccccccccchhhhhhhhhccccc 
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SEfl RSPISETflVPRPENEALRESGSfiKARVGTKSPflRKSIMEEILVEESPDVDSTKSPIdEPDG 

SEG 

PRD cccccccccccccchhhhhcccccccccccccchhhhhhhhhhhcccccccccccccccc 

5 SEfl LPLLEUNLCLEDFRKAVNVLPDENHREDALHRLNKAALELCflKPRPLflSNLLHflMCLflLli] 

SEG 

PRD ccccchhhhhhhhhhhhccccccchhhhhhhhhhhhhhhhhhcccccchhhhhhhhhhhh 

SEfl RDVIDSLVGHSnWLRSVLGLPEKETIYLNVPEEflDflKSPPIMEVKVPVGKAGKEERKGAA 

10 SEG 

PRD hhhhhhhhccchhhhhhccccccceeeeecccccccccccceeeeeccccchhhhhhhhh 

SEfl AEKKflLGIKDKEDKKGAKLLGKEDRPNSKKHKAKDDKKVIKSASflDRFSLEDPTPDIILS 

SEG xxxxxxxxxxxxxxxxx 

15 PRD hhhhhhccccccccchhhhhccccccccccccccccceeeeecccccccccccccceeee 

SEfl SflEPIDPLVMGKYTflRLHSEVRGLLDTLVTDLMVLADELSPIKNVEEALRLCR 

SEG xxxxxxxxxxxx 

PRD ccccccceeechhhhhhhhhhhhhhhhhhhhhhhhhhhccccchhhhhhhccc 



20 



25 



(No Prosite data available for DKFZphtes3_15r.m .1) 
(No Pfam data available for DKFZphtes3_lSnm • 1) 
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DKFZphtes3_lbbS 



PCT/IB01/02050 



5 group: cell structure and motility 

DKFZphtes3_lbb5 encodes a novel 2tfl amino acid protein with 
similarity to various tropomyosins. 

10 Tropomyosins play regulatory roles in cellular structure and 
transport. 



15 



20 



The new protein can find application in modulating cell structure 
and motility as well as modulationg cellular transport- 
weak similarity to KIAA077M 
perhaps complete cds- 
Sequenced by BNFZ 
Locus: unknown 

25 Insert length: 131b bp 

Poly A stretch at pos- 12M7i polyadenylation signal at pos. 1232 

1 TGCTAAAATG GAATTAGAGA GAAGCATAGA CATCAGCAGA AGACAGAGTA 

30 51 AGGAGCACAT ATGTAGAATT ACAGATCTAC AAGAGGAATT AAGACACAGA 

101 GAGCATCACA TCTCTGAATT GGATAAGGAG GTTCAGCACC TTCATGAGAA 

151 TATAAGTGCC CTAACCAAAG AACTGGAATT TAAGGGGAAA GAAATTCTCA 

201 GAATACGAAG TGAATCTAAC CAACAGATAA GGTTGCATGA ACAAGATTTA 

551 AACAAGAGAC TTGAAAAAGA GTTGGATGTC ATGACAGCAG ACCACCTCAG 

35 301 AGAGAAAAAT ATCATGCGGG CAGATTTTAA TAAGACTAAC GAGCTACTCA 

351 AGGAAATAAA TGCCGCTTTA CAAGTGTCAT TAGAAGAAAT GGAAGAAAAA 

1401 TATCTAATGA GAGAATCAAA ACCAGAAGAT ATACAGATGA TTACAGAATT 

HS1 AAAAGCCATG CTTACAGAAA GAGACCAGAT CATAAAGAAA CTAATTGAGG 

501 ATAATAAGTT TTATCAGCTG GAATTAGTCA ATCGAGAAAC TAACTTCAAC 

40 551 AAAGTGTTTA ACTCAAGTCC TACTGTTGGT GTTATTAATC CATTGGCTAA 

bOl GCAAAAGAAG AA6AATGATA AATCACCAAC AAACAGGTTT GT6AGTGTTC 

t51 CCAATCTAAG TGCTCTGGAA TCTGGTGGAG TGGGCAATGG ACATCCTAAC 

701 CGCCTGGATC CCATTCCTAA TTCTCCAGTC CACGATATTG AGTTCAACAG 

751 CAGCAAACCA CTTCCACAGC CAGTGCCACC TAAAGGGCCC AAGACATTTT 

45 flOl TGAGGTATCA GTAAGATGCA TGTGCATGAG CTCAAGGAAC ATGACTACTG 

851 GAGTTTCCAT TACACATTGT TGCGTGCCTT GTAATTTTCC CCAAAGACGT 

=JQ1 CCTGCTCAGA GTGAAGCTTC TCCAGTGGCT TCTCCAGATC CCCAGCGCCA 

"151 GGAGTGGTTT GCCCGGTACT TCACATTCTG AAAGAATTGT GTTGGCACAG 

1001 CTCTGTATAG ACTGTTACTA AGAGCATGAC TTTATACAGA TTGTTATGTA 

50 1051 AATAGGCTTT CCTATGTCAA ACACTGTGAA TGAGAAAGTA TTTGTCTCTC 

1101 CAACTTGAAA ATGCACTGTA TTTCCTGTGA TATTTATTGG AATCATTCTA 

1151 TAAGGTACTA TATTATGTGT GTAATTATAA CTGTTATTTT TATTTGAGAT 

1201 GGAAGAGTCT TTAACCTTTG TAATTACTGC ATAATAAATT TTGTTAGAAT 

1251 CAAAAAAAAA AAAAAAAAAA AAAAAAAAAA AAAAAAAAAA AAAAAAAAAA 
55 13D1 AAAAAAAAAA AAAAAA 



BLAST Results 
-314- 



WO 01/98454 
No BLAST result 



PCT/IB01/02050 
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Medline entries 

No Medline entry 

Peptide information for frame 2 

ORF from A bp to fill bp} peptide length: 2fc,a 
Category: similarity to known protein 
Classification: Cellular transport and traffic 

20 1 MELERSIDIS RRUSKEHICR ITDLUEELRH REHHISELDK EVflHLHENIS 

51 ALTKELEFKG KEILRIRSES N<2<3IRLHEl2D LNKRLEKELD VMTADHLREK 

101 NIMRADFNKT NELLKEINAA LflVSLEEMEE KYLMRESKPE DlfiNITELKA 

151 MLTERDflllK KLIEDNKFYfl LELVNRETNF NKVFNSSPTV GVINPLAKflK 

B01 KKNDKSPTNR FVSVPNLSAL ESGGVGNGHP NRLDPIPNSP VHDIEFNSSK 

25 251 PLPflPVPPKG PKTFLRYfl 



BLASTP hits 

30 

No BLASTP hits available 

Alert BLASTP hits for DKFZphtes3_lfcib5 ■. frame 2 
35 No Alert BLASTP hits found 

Pedant information for DKFZphtes3_lt>bS v frame 2 

40 Report for DKFZphtes3_lbb5.2 

[LENGTH} 27D 

EMIiO BlMia.QT 

45 EpIJ b- 10 

EHOMOLl PIR:A57013 early endosome antigen 1 - human le-05 

EFUNCATID 03-n recombination and dna repair ES. cerevisiaen 

Y0L03Hw3 le-05 

EFUNCATJ 03-22 cell cycle control and mitosis ES. cerevisiaei 

50 YFR031c3 2e-05 

EFUNCATJ 30.10 nuclear organization ES. cerevisiaei YFR031c3 
2e-05 

EFUNCATJ 11> 01 dna repair (direct repairi base excision repair 
and nucleotide excision repair) ES. cerevisiaei YKROISwJ 5e-0S 

55 EFUNCAT3 30. OM organization of cytoskeleton ES. cerevisiaei 

Y»R3Sbw3 7e-05 

EFUNCAT3 0T-10 nuclear biogenesis ES- cerevisiae-, YDR35bw3 

7e-05 
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IFUNCAT3 Ofl-D? vesicular transport (golgi network-i etc) CS. 
cerevisiae-. YDLD5flw3 le-Q 1 ! 

CFUNCAT3 3D-03 organization of cytoplasm ICS- cerevisiae-i 
YDLOSflwJ le-DM 

5 EFUNCAT3 1 genome replication! transcription! recombination and 
repair £11. jannaschii, MJlbM33 2e-D4 

IFUNCAT3 11 unclassified proteins IS- cerevisiae-. YLR3DTc3 

3e-0H 

IFUNCAT3 Dfl.lb extracellular transport IS- cerevisiae-. 

10 YNLB?Bc3 Se-DH 

IFUNCAT3 SQ-OI organization of intracellular transport vesicles 
CS- cerevisiae-. YNL2?2c3 Se-DH 

EKU3 All_Alpha 

CKU3 L0W_C0MPLEXITY 4- 61 X 

15 EKIiO C0ILED_C0IL 1D-7M * 



SEfl AKIIELERSIDISRRflSKEHICRITDLflEELRHREHHISELDKEVflHLHENISALTKELEF 

SEG 

20 PRD ccchhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhh 
COILS 

CCCCCCCCCCCCCCCCCCCCCCCC 

SEfl KGKEILRIRSESNflfllRLHEflDLNKRLEKELDVIlTADHLREKNINRADFNKTNELLKEIN 

25 SEG 

PRD hhhhhhhhhhcchhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhh 
COILS 

ccccc 

30 SEfl AALflVSLEEMEEKYLMRESKPEDIflMITELKAMLTERDflllKKLIEDNKFYflLELVNRET 

SEG 

PRD hhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhh 
COILS 



35 

SEfl NFNKVFNSSPTVGVINPLAKflKKKNDKSPTNRFVSVPNLSALESGGVGNGHPNRLDPIPN 

SEG : 

PRD hhhhhhhcccceeeehhhhhhhhhhccccccceeeccccccccccccccccccccccccc 
COILS 

40 

SEfl SPVHDIEFNSSKPLPflPVPPKGPKTFLRYfl 

SEG xxxxxxxxxxxxx 

PRD ccceeeeecccccccccccccccceeeccc 
45 COILS 



(No Prosite data available for DKFZphtes3_lbbS-2) 
50 (No Pfam data available for DKFZphtes3_lbbS-2) 
DKFZphtes3_lbp3 



55 

group: testis derived 
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WO 01/98454 PCT/IB01/02050 

DKFZphtes3_lbp3 encodes a novel lbfci3 amino acid protein without 
similarity to known proteins. 

The novel protein is glutamine rich and contains a cell 
5 attachment RGD motif. According to the low number of ESTs and 
their origin the protein seems to be expressed ubiquitously at 
low levels* 

No informative BLAST resultsi No predictive prositei pfam or SCOP 
motif e < 

10 

The new protein can find application in studying the expression 
profile of testis-specif ic genes- 



is putative protein 

perhaps complete cds. 
Sequenced by BUFZ 

20 

Locus: unknown 

Insert length: 5111 bp - 

Poly A stretch at pos. 535Mi polyadenylation signal at pos. 53M0 

25 

1 GGCGGCCAGG TGGAGGACCT GAGCAAGCAG CTCAAGCGTG TGGACGGCCA 

51 GGTGCAGGGC ATCGCCACGC ACGTGCAGCA CTTCTCCCAG GCCAGCGGGC 

1D1 TTGACCTGGC CGCGCTAGAG TGGCCGGAGG AGCAGGAGGT GGGCGTGCGG 

30 151 GCGTTCGATA GGGTGCGGAC TGGGAGTATC ATGAAGGACG CCGCCGAGGA 

SOI GCTCAGCTTT GCCAGGGTAC TTTTACAGCG GGTTGATGAA CTAGAGAAGC 

251 TATTCAAAGA TCGGGAGCAA TTCCTGGAAC TAGTCAGCCG GAAGCTGAGT 

301 TTGGTTCCTG GTGCA6AAGA AGTCACCATG GTCACCTGGG AAGAGCTGGA 

351 GCAGGCGATT ACGGACGGCT GGAGAGCCTC ACAAGCGGGC TCAGAAACAC 

35 M01 TTATGGGATT TTCTAAGCAC GGAGGGTTCA CTTCCTTAAC ATCACCTGAA 

M51 GGGACTCTAA GCGGAGACTC TACCAAGCAA CCAAGTATTG AGCAGGCTCT 

SOI GGATTCTGCC AGTGGTCTTG GCCCGGATCG GACTGCATCA GGATCTGGTG 

551 GCACAGCACA CCCCTCTGAT GGGGTTTCCA GTAGGGAACA AAGCAAGGTC 

bOl CCCTCTGGTA CTGGGAGACA GCAGCAGCCG AGGGCCCGTG ATGAAGCTGG 

40 bSl CGTGCCACGA CTCCATCAGT CTTCTACATT CCAATTCAAA TCAGACTCAG 

701 ATCGTCACAG GAGTAGAGAG AAGCTTACCT CGACACAACC AAGAAGAAAT 

751 GCACGTCCTG GTCCAGTTCA ACAGGACTTA CCCTTGGCCA GAGACCAGCC 

flOl CAGTAGTGTG CCCGCTAGCC AGAGTCAGGT CCATCTAAGG CCAGATCGTC 

flSl GTGGGTTAGA ACCAACTGGC ATGAATCAGC CTGGATTAGT GCCTGCTAGC 

45 =101 ACTTACCCAC ATGGTGTGGT ACCCCTCAGC ATGGGTCAGC TTG6TGTGCC 

■151 ACCACCTGAA ATGGATGATC GGGAATTGAT ACCATTTGTC GTGGATGAGC 

1001 AACGTATGTT GCCACCATCA GTACCTGGCA GAGACCAGCA AGGATTGGAA 

1051 CTACCTAGCA CAGACCAACA TGGTCTGGTT TCAGTCAGTG CATATCAGCA 

1101 TGGTATGACA TTTCCTGGCA CAGACCAACG CAGTATGGAA CCACTTGGCA 

50 1151 TGGATCAGCG TGGATGTGTA ATATCAGGCA TGGGTCAGCA AGGACTAGTA 

12D1 CCCCCTGGTA TAGACCAGCA AGGATTGACA TTGCCTGTCG TCGATCAACA 

1551 TGGCCTGGTT CTACCTTTTA CAGACCAGCA TGGTTTGGTA TCACCTGGTT 

1301 TGATGCCAAT TAGTGCAGAT CAGCAAGGTT TTGTGCAGCC CAGTTTGGAA 

1351 GCAACTGGCT TCATACAACC TGGCACAGAG CAGCATGATT TGATCCAGTC 

55 mOl TGGCAGATTT CAGCGTGCTT TGGTGCAGCG TGGTGCATAT CAGCCTGGCT 

1451 TGGTCCAACC TGGTGCAGAT CAGCGTGGTT TGGTCCGGCC TGGAATGGAT 

1501 CAGTCTGGTT TGGCCCAACC TGGTGCAGAT CAGCGTGGTT TGGTCTGGCC 

1551 TGGAATGGAT CAGTCTGGTT TGGCCCAACC TGGTAGAGAT CAGCATGGTT 
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IbOl TGATCCAGCC 

lbSl CAGGGTGTCT 

1701 TGGCAGATTT 

17S1 TGGTCCAACC 

5 IflOl CAGCATGGTT 

16S1 TGGTGCAGTT 

1101 TGGCACAACC 

nSl GATCAGCGTG 

S0D1 ACCTGGAGTG 

10 2051 GTTTGGTGCA 

21Q1 GTTCAGCGTG 

2151 ACCTGGAGTG 

2201 GTTTGGTCCA 

2251 GATCAGCGTG 

15 2301 ACCTGGAGTG 

2351 GTTTGATCCA 

2101 GGTCAGCTGG 

2451 ACCTCAGGCA 

2501 GTTTGGTACA 

20 2551 TATCCACGTG 

2b01 ACCTGGTGCA 

2bSl GCTCTTCAAC 

2701 TATCAACATG 

2751 ACCACTCCTA 

25 2fl01 GTTTGGTACC 

2A51 GACCAGCACA 

2101 AGATCAACAG 

2=551 ACCCAGATGC 

3001 GATTCAATGT 

30 3D51 GCATGGCCAG 

3101 ATGGAATTCC 

3151 AGTCCAGACT 

3201 TGAAGTCCTG 

3251 TCCCCACGGC 

35 3301 TATGTGGGGC 

3351 CCAAACCGAC 

3401 GGACCATACC 

3151 GCCAAAGAAG 

3501 CCTGGAAGGG 

40 3551 AGCTGAGATT 

3b01 AAGGAGCTGG 

3b51 GGAAAATTCT 

3701 AGCTCAGGAT 

3751 TCCATGAGCA 

45 3601 CGACCCTGAG 

3851 TCAGCACGCT 

3101 CTGGCCGTCT 

3151 GGAGCTGCTG 

4001 GCGAGAAGCT 

50 4051 AAACAGAAGG 

4101 GGAAAAGGCC 

4151 ACAAGAGTGC 

4201 ACGGAGCAGC 

4251 GCAGGAGCAG 

55 4301 ACAACAAGCT 

4351 GATCGGTGGA 

4401 CCAGGCAGAC 

4451 ACTGCCTCTC 



TGGCACAGGT CAGCATGATT 
TGGTACAGCC TGGTGTAGAT 
CAGCGTGCTT TGGTGCAGCC 
TGGTGCAGAT CAGATTGATG 
TGGTACAATC TGGTGCAGAT 
CAGCATGGTT TGGTCCAACC 
TCGTGCAGAT CATCAGCGTG 
GTTTGGTCCA ACCTGGTGCA 
GATCAGCATG GTTTGGCACA 
ACCTGGTATA GTTCAGCGTG 
GTTTGGTGCA ACCTGGTGCA 
GATCAGCGTG GTTTGGTTCA 
ACCTGGTGCA GTTCAGCATG 
GTTTGGTCCA ACCTGGAGTG 
GATCAGCGTG GTTTGGTCCA 
ACCTGGTGCA GATCAGCCTG 
GTATGGTGCA GCCTGGAATA 
GATCCACATG GCCTGGTACA 
ACCTGGTGCA TATTTGCATG 
GTCTGGTGCA GCCAGGAATG 
TATCAGCCAG GCTTGATAGC 
ATTCCAGGCA GATTCTACAG 
GTATGGTACC TCCTGGCAGA 
GCCAGTCAAG GTTTGGCATC 
ACCAGAAACT TATCAGCAAG 
GCCCAATACC ACTGAGTACA 
CATGTGGCAT CACCTGGCCC 
AGCTCAGCAT GGCCATGCTT 
ATCCTGGTTA TCGTGGCCCA 
GAAGGTTTGG ATCCAAATAG 
TGCCCAGAAG GCCCCAGGCC 
CCGTCGACCG AGTCTTATCA 
AGTGAGCGAC GCAATTCACT 
AGTGGAGACA TTTCATCTGA 
TAAAGGAGAG TATGAAGGAT 
TTGGAGAAGA TCCAGTTCCT 
TCCTGAACTG CAGGAGCAGC 
TTTGGCAGGA GAAAGCAAAA 
GAAGGGAATC AAGAAGCAGG 
GCAGCTGGGT GTCCTCAGAG 
CCGAGTTGAG GGAGAGCCAA 
GTCTCT6AAG CCTCCCTTTA 
GATCATTGAG AGCAT6CTGA 
TGGCCCCGCA CAAGGCCCAC 
GCCACCTGTC CAGCCTGCAG 
GGTGCGGCGC TATGA6CAAC 
CCCGACCCTC CAAGAAGGCC 
GGCCGTGTGC AGAGTGCCAT 
CAACATCACC ACCAGCAACC 
ACATTGCTAT GCTGTACCAG 
AACAGGGAGC ACCTGGAGAT 
TCTGGCCACC AAAGTGAGCC 
TGAACCACAT GATGCAGGAG 
GACTGGCAGA AGATGCTGGA 
GGACCGCCTG GAGCTGGACC 
AATCGCTGCG ACAGCAGCTC 
GAGGCGGCTG CCATGCGGAG 
ATGTGACCGG CCCTTGGAGA 



TGGTCCAATC TGGCACAGGT 
CAGCCTGGCA TGGTCCAACC 
TGGTGCATAT CAGCCTGGCT 
TGGTGCAACC TGGTGCAGAT 
CAGAGTGATT TGGCTCAACC 
TGGAGTAGAT CAGCGTGGTT 
GTTTGGTCCC ACCTGGTGCA 
GATCAGCATG GTTTGGTCCA 
ACCTGGTGAA GTTCAGCGTA 
GTTTGGTGCA ACCTGGTGCA 
GTTCAGCGTG GTTTGGTCCA 
ACCTGGTGCA GTTCAGCGTG 
GTTTGGTCCA ACCTGGTGCA 
GATCAGCGTG GTTTGGTGCA 
ACCTGGAATG GACCAGCGTG 
GTTTGGTCCA GCCTGGTGCA 
GGTCAGCAAG GTATGGTGCA 
ACCTGGTGCC TATCCTCTTG 
ATTTATCTCA ATCTGGGACA 
GATCAGTATG GTTTGAGACA 
ACCAGGCACA AAGCTTCGTG . 
GTTTTATATC AGTACGTCCA 
GAACAATACG GCCAGGTGTC 
ACCTGGTATA GATCGAAGGA 
GTTTGATGCA TCCTGGCACA 
GGTTTGGGAT CTACACACCC 
AGGTGAGCAT GACCAGGTAT 
TCTCTCTCTT TGACAGTCAT 
GGGTATCTAA GTGCTGATCA i 
AACACGAGCC TCGGACCGAC 
AAGATGTCAC TCTTTTCAGG . 
GAAGGGAGCG AAGTCTCGAG 
GCGTA6AATG AGTTCTAGTT 
TGGGAGAGCT CAGTAGCCTC 
CTGGATGAGG AGCAGGCCGG 
GCTGGCACAG ATGGTCAAAA 
TGAAGACCGT AAAGACGCTA 
GTGGAAAGGC TGCAGAGGAT 
GAAGGAACTG AAAGCT6GAG 
TCACCGTGGC TGACATAGAA 
GACAGGGGCA A6GCTGCCAT 
CCTGCAGGAC CAGTTGGACA 
CCTCCTCCTC CACGCTCCTG 
ACCTTGGCTC CTGGCCAGAT 
CCTGGATGTG AGCCATCAGG 
TCCAAGACAT GGTCAACAGC 
AAGCTCCAGA GACAGGACGA 
CCTGCAGGTG CAGGGTGACT 
TCATC6AGGA CCATCGGCAG 
GGTCTGGAGA AGCTCGAAAA 
GGAGATCGAT GTGAAAGCCG 
GTGTCCAGTT TGATGCCACC 
CTGGTGGCCA AGATGAGCGG 
CAGGCTGCTC ACAGAGATGG 
CAGTGAAGCA GTTGCTGGAG 
AGGGAGCGCC CCCCACTCTA 
GCAGCTCCTG GCACATTTCC 
CACCTGTGAC TG6ACATGCC 
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45D1 ATCCCCGTGA CCCCCGCGGG TCCAGGCCTA CCTGGGCACC ATTCCATCCG 

4551 CCCCTACACG GTGTTTGAAC TGGAGCAGGT CCGGCAGCAT AGCCGCAACC 

4b01 TCAAGCTGGG CAGCGCCTTC CCTCGGGGTG ACCTGGCGCA GATGGAGCAG 

4fc.51 AGCGTGGGGC GCCTGCGCTC CATGCACTCC AAGATGCTGA TGAACATTGA 

5 4701 GAAGGTGCAG ATCCACTTCG GGGGCTCCAC CAAGGCCAGC AGCCAGATAA 

4751 TCCGCGAGCT GCTGCACGCC CAGTGCCTGG GCTCCCCCTG CTACAAACGG 

4S01 GTGACAGATA TGGCTGATTA CACCTACTCA ACTGTGCCCC GGCGCTGCGG 

4fl51 GGGCAGCCAC ACCCTCACCT ACCCCTACCA CCGCAGCCGC CCGCAGCACC 

4=101 TTCCCCGGGG CCTGTATCCT ACTGAAGAGA TCCAGATTGC CATGAAGCAT 

10 4^51 GATGAGGTGG ACATCTTGGG CCTGGATGGC CACATTTACA AGGGACGGAT 

5001 GGACACAAGG CTGCCAGGCA TCCTCCGAAA AGACAGCTCA GGGACCTCAA 

5051 AGCGCAAGTC CCAGCAGCCC AGGCCCCAC6 TGCACAGGCC GCCATCCCTC 

5101 AGCAGCAATG GCCAGCTGCC CTCTCGGCCA CAGAGCGCCC AGATTTCGGC 

5151 TGGCAACACC TCAGAAAGAT AGACCTTCCT CCGAGGGCCG TCTCTCCCAG 

15 5B01 CCGAACACAG CCCACCCGCC CA6CTCCGCC TCGGTGGCAA ACAGGGGGCT 

5251 GGAGAGGCAC GTGGACATGC CTCCTGGGGA GGGGCTCGAG GAGCCCACGC 

5301 66GGGCCGCG GTCCA6CACC GCTCAGTGAG CGGAGGTGTA AATAAACATT 

5351 CAGGAGGAAA AAAAAAAAAA AAAAAAAAAA AAAAAAAAAA AAAAAAAAAA 

5401 AAAAAAAAAA. A 

20 

BLAST Results 



25 No BLAST result 



Medline entries 



30 

No Medline entry 



35 Peptide information for frame 1 



ORF from Ifll bp to 51b1 bp; peptide length: lbb3 
Category: putative protein 
40 Classification: no clue 

Prosite motifs: RGD (14fla-14S4) 



1 MKDAAEELSF ARVLLflRVDE LEKLFKDREfl FLELVSRKLS LVPGAEEVTM 

45 51 VTWEELEdAI TDGUIRASflAG SETLMGFSKH GGFTSLTSPE GTLSGDSTKfl 

101 PSIEGALDSA SGLGPDRTAS GSGGTAHPSD GVSSREflSKV PSGTGRfiflflP 

151 RARDEAGVPR LH<2SSTF<2FK SDSDRHRSRE KLTSTdPRRN ARPGPV<3<2DL 

B01 PLARDflPSSV PAS<3S<2VHLR PDRRGLEPTG MN(2PGLVPAS TYPHGVVPLS 

BS1 MGflLGVPPPE MDDRELIPFV VDEflRMLPPS VPGRD(2<3GLE LPSTDflHGLV 

50 301 SVSAYflHGHT FPGTDflRSME PLGMDdSRGCV ISGMG(2(3GLV PPGID(2<26LT 

351 LPVVDtfHGLV LPFTDflHGLV SPGLMPISAD (2<3GFV<2PSLE ATGFItJPGTE 

401 UHBLIflSGRF URALVflRGAY APGLVtJPGAD flRGLVRPGHD (JSGLAflPGAD 

451 flRGLVUPGMD flSGLACPGRD flHGLIflPGTG flHDLVflSGTG fiGVLViJPGVD 

501 (3PGMVAPGRF (JRALVtJPGAY flPGLVflPGAD fllDVVfiPGA]) fiHGLVflSGAD 

55 551 (2SJ)LAf3PGAV flHGLVflPGVD ARGLAflPRAD HflRGLVPPGA DflRGLVflPGA 

bOl DiJHGLVflPGV DflHGLAflPGE VflRSLVfiPGI VflRGLVQPGA VURGLVflPGA 

b51 V<2RGLV<2PGV DflRGLVflPGA VC3RGLV<2PGA VflHGLVflPGA PflRGLVflPGV 

701 D(2RGLV(3PGV DURGLVOPGM D<3RGLI(2PGA DfiPGLVCPGA GflLGMVflPGI 
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751 GflflGIIVCJPflA DPHGLVflPGA YPLGLVCPGA YLHDLSflSGT YPRGLVdJPGM 

flOl DflYGLRflPGA YflPGLIAPGT KLRGSSTFfiA DSTGFISVRP YflHGHVPPGR 

fiSl EflYGflVSPLL ASflGLASPGI DRRSLVPPET YflflGLMHPGT DflHSPIPLST 

GLGSTHPD(2(2 HVASPGPGEH DflVYPDAAflH GHAFSLFDSH DSMYPGYRGP 

5 151 GYLSAD<2HG<2 EGLDPNRTRA SDRHGIPAtJK APGflDVTLFR SPDSVDRVLS 

1001 EGSEVSSEVL SERRNSLRRM SSSFPTAVET FHLMGELSSL YVGLKESNKD 

IQSi LDEEflAGflTD LEKI(3FLLA(J MVKRTIPPEL flEnJLKTVKTL AKEVUflEKAK 

1101 VERLtJRILEG EGNC3EAGKEL KAGELRLULG VLRVTVADIE KELAELRESfl 

1151 DRGKAAMENS VSEASLYLflD (2LPKLRMIIE SflLTSSSTLL SPISPIAPHKAH 

10 1B01 TLAPGGIDPE ATCPACSLDV SHflVSTLVRR YE<3L<2DMVNS LAVSRPSKKA 

1551 KL<2R(JDEELL GRVfiSAILflV (JGDCEKLNIT TSNLIEDHRfl KdKDIAMLYfl 

1301 GLEKLEKEKA NREHLEMEID VKADKSALAT KVSRVflFDAT TEtSLNHflMflE 

1351 LVAKIISGUEfl DltlflKNLDRLL TEMDNKLDRL ELDPVKlJLLE DRh)KSLR(2<2L 

1M01 RERPPLYfiAD EAAAflRRtJLL AHFHCLSCDR PLETPVTGHA IPVTPAGPGL 

15 1MS1 PGHHSIRPYT VFELEflVRflH SRNLKLGSAF PRGDLAflMEfl SVGRLRSMHS 

1501 KMLMNIEKVt? IHFGGSTKAS SflllRELLHA (SCLGSPCYKR VTDMADYTYS 

. 1551 TVPRRCGGSH TLTYPYHRSR PUHLPRGLYP TEEIfllANKH DEVDILGLDG 

IbOl HIYKGRMDTR LPGILRKDSS GTSKRKSfllJP RPHVHRPPSL SSNGflLPSRP 

IbSl (3SA(2ISAGNT SER 

20 



BLASTP hits 

25 No BLASTP hits available 

Alert BLASTP hits for DKFZphtes3_lbp3i frame 1 
No Alert BLASTP hits found 

30 

Pedant information for DKFZphtes3_lbp3i frame 1 



35 



Report for DKFZphtes3_lbp3 - 1 



CS» cerevisiaei 



ELENGTHJ 1723 

IE II lill ia735H. c ifi 

EpIJ b-n 

40 EH0M0L3 TREHBL : AFOSSMbl^ gene: "l101D1.5 n i Caenorhabditis 

elegans cosmid MOIDI- le-47 

EFUNCAT3 30. 03 organization of cytoplasm 

YDLOSflwl fle-07 

EFUNCAT1 Ofl.07 vesicular transport (golgi networki etc) ES. 
45 cerevisiaei YDLOSBuJ fie-07 

EFUNCAT3 "H unclassified proteins ES- cerevisiaei Y0R21bc3 

2e-04 

CFUNCAT2 1L0M dna repair (direct repair-i base excision repair 
and nucleotide excision repair) ICS* cerevisiaei YKROISwl 0.001 

50 EFUNCATl 30-10 nuclear organization ES. cerevisiaei YKROISwI 
0.001 

EBLOCKSJ PROIO^SC 

EBLOCKSU BP0230AD 

EBL0CKS1 PR005M3H 

55 EBLOCKSJ PR00210G 

CBL0CKS3 PR0D210E 

CBL0CKS3 BP0M23tA 
EPIRKIO RNA binding 3e-Db 
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EPIRKtO hydroxylysine Se-ID 

IPIRKIO endoplasmic reticulum 7e-lfl 

EPIRKIO ATP 2e-Db 

EPIRKIO phosphoprotein 3e-0b 

5 EPIRKliD seed Me-3M 

EPIRKIO saliva 2e-lD 

EPIRKliO glycoprotein 2e-10 

EPIRKtO heterotrimer 3e-Qb 

EPIRKUJ alternative splicing 2e-10 

10 EPIRKbO P-loop He-Db 

CPIRKliU storage protein 1e-3 l 4 

CPIRKliU extracellular matrix 2e-10 

EPIRKIO membrane protein 7e-lfl 

CPIRKIO protein biosynthesis 7e-lfl 

15 ESUPFAHH myosin motor domain homology Ee-Dt 

CSUPFAMll elastin 2e-10 

ESUPFANJ glutenin 5e-37 

ESUPFAMI myosin heavy chain 2e-0b 

CSUPFAflU unassigned ribonucleoprotein repeat-containing proteins 
20 3e-Db 

CSUPFAH3 proline-rich protein Se-10 

ESUPFArO ribonucleoprotein repeat homology 3e-0ti 

CPR0SITE3 RGD 1 

EKU3) All_Alpha 

25 EKIO L0UI_C0I1PLEXITY '/. 

CKU3 C0ILED_C0IL L&O V. 



SE<3 GGl2VEDLSKflLKRVDG<2V<3GIATHV<aHFS<MSGLDLAALEIi)PEEflEVGVRAFDRv-RTGSI 

30 SEG 

PRD cccchhhhhhhhhhhhheeeeeeeeeeccccccchhhhhhhhccceeeeeeeeeeecccc 
COILS 



35 SEfi nKDAAEELSFARVLLflRVDELEKLFOREflFLELVSRKLSLVPGAEEVTrivTUEELEfiAI 

SEG 

PRD chhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhccccccchhhhhhhhhhhhh 
COILS 



40 

SE(2 TDGlilRAScJAGSETLIIGFSKHGGFTSLTSPEGTLSGDSTKflPSIElJALDSASGLGPDRTAS 

SEG 

PRD hhhhccccccceeeeeccccccccccccccccccccccchhhhhhhhhhccccccceeec 
COILS 



SE<2 GSGGTAHPSDGVSSRE(2SKVPSGTGRflflfiPRARDEAGVPRLHl3SSTFflFKS»S»RHRSRE 

SEG 

PRD cccccccccceeeccccccccccccccchhhhhhhhccchhhhhcccccccccccccccc 
50 COILS 



SE(3 KLTSTflPRRNARPGPVflflDLPLARDtJPSSVPASfiSflVHLRPDRRGLEPTGtlNflPGLVPAS 

SEG xxxxxxxxxxxx 

55 PRD cccccccccccccccccccccccccccccccccccccccccccccccccccccccccccc 
COILS 
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SEC TYPHGVVPLSMG<2LGVPPPEI1DDRELlPFVVDE<JRriLPPSVPGRD(3l2GLELPSTD(2HGLV 

SEG 

PRD cccccccccccccccccccccccccccccccccccccccccccccccccccccccccccc 
COILS 

5 

SE<3 SVSAY<2HGMTFPGTDaRSnEPLGMD<2RGCVISGMG<3<2GLVPPGID<J<2GLTLPVVD<2HGLV 

SEG 

PRD cccccccccccccccccccccccccccccccccccccccccccccccccccccccccccc 
10 COILS 



SE<3 LPFTD(2HGLVSPGLI1PISAD<J<2GFV<2PSLEATGFI<2PGTE(2HDLI<2SGRF<JRALV(3RGAY 

SEG 

15 PRD cccccccccccccccccccccccccccccccccccccccccccccccccccccccccccc 
COILS 



SE<2 <2PGLV(2PGADl3RGLVRPGnDl2SGLA<2PGADaRGLVlilPGI1D<3SGLA<2PGRD(2HGLI<2PGTG 

20 SEG 

PRD cccccccccccccccccccccccccccccccccccccccccccccccccccccccccccc 
COILS 

25 SE<2 (2HDLV(3SGTG(JGVLViJPGVD(3PGriV(2PGRF(2RALV(2PGAYfiPGLVflPGADl3IDVV(2PGAD 

SEG 

PRD cccccccccccccccccccccccccccccccccccccccccccccccccccccccccccc 
COILS 

30 

SEC (3HGLV(3SGAD(2SDLA(3PGAV(3HGLV(3PGVD(3RGLA(3PRADH(2RGLVPPGAD(3RGLV(3PGA 

SEG 

PRD cccccccccccccccccccccccccccccccccccccccccccccccccccccccccccc 
COILS 

35 

SE<2 D(2HGLV(2PGVD(2HGLAflPGEV(2RSLV(2PGIV(3RGLV(3PGAV(3RGLV(3PGAV(aRGLV(3PGV 

SEG 

PRD cccccccccccccccccccccccccccccccccccccccccccccccccccccccccccc 
40 COILS 

SE<2 DdRGLV<2PGAVflRGLVdPGAVflHGLV<2PGADflRGLVflPGVDflRGLVflPGVDflRGLVflP6l1 

SEG 

45 PRD cccccccccccccccccccccccccccccccccccccccccccccccccccccccccccc 
COILS 



SE<2 D(2RGLI(JPGAD(JPGLV(3PGAG(3LG(1V(3PGIGfl(2Gf1V(3Pl2ADPHGLV(2PGAYPLGLVaPGA 

50 SEG 

PRD cccccccccccccccccccccccccccccccccccccccccccccccccccccccccccc 
COILS 

55 SE(3 YLHDLS(2SGTYPRGLV(2PGriD(JYGLR(3PGAYflPGLIAPGTKLRGSSTF(3ADSTGFISVRP 

SEG • 

PRD cccccccccccccccccccccccccccccccccccccccccccccccccccccccccccc 
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SE<3 Y<2HGMVPPGREC3YG<2VSPLLAS(2GLASPGIDRRSLVPPETY(2<3GLriHPGTD(2HSPIPLST 

SEG 

PRD cccccccccccccccccccccccccccccccccccccccccccccccccccccccccccc 
COILS 



10 SE<2 GLGSTHPD(2i2HVASPGPGEHD(2VYPDAA(2HGHAFSLFDSHDS[1YPGYRGPGYLSAD<2HG(2 

SEG 

PRD cccccccccccccccccccccccccccccccccccccccccccccccccccccccccccc 
COILS 



15 

SE(3 EGLDPNRTRASDRHGIPAfiKAPGflDVTLFRSPDSVDRVLSEGSEVSSEVLSERRNSLRRII 

SEG xxxxxxxxxxxxxxxxxxxxxxx- 

PRD ccccccccccccccccccccccccceeeeeccccccccccccchhhhhhhhhhhcccccc 
COILS 

20 

SEA SSSFPTAVETFHLHGELSSLYVGLKESMKDLDEE(2AG(2TDLEKI(2FLLA(2H\/KRTIPPEL 

SEG 

PRD cccccceeeeeeeeeccceeehhhhhhhhhhhhhcccchhhhhhhhhhhhhhhhhcchhh 
25 COILS 



SE(3 (3E<3LKTVKTLAKEVId(3EKAKVERL(2RILEGEGN(2EAGKELKAGELRL(2LGVLRVTVADIE 

SEG 

30 PRD hhhhhhhhhhhhhhhhhhhhhhhhhhhhhhccchhhhhhhhhhhhhhhhhhhhhhhhhhh 
COILS 

CCCCCCCCCCCCCCCCCCCCCC 

SE<3 KELAELRES(2DRGKAAriENSVSEASLYLl2D(2LDKLRf1IIESI1LTSSSTLLSf1SI1APHKAH 

35 SEG xxxxxxxxxxxxxx 

PRD hhhhhhhhhhhchhhhhcccchhhhhhhhhhhhhhhhhhhhhhccccceeeehhhhhhhh 
COILS 

CCCCCCCCC 

40 SEfl TLAPG(3IDPEATCPACSLDVSH<SVSTLVRRYE<2L<3DI1VNSLAVSRPSKKAKL<2Rl2DEELL 

SEG 

PRD hhcccccccccccccccccchhhhhhhhhhhhhhhhhhhhhhccccchhhhhhhhhhhhh 
COILS 



45 

SE(3 GRVflSAILflVi2GDCEKLNITTSNLIEDHRl3Ki2KDIAI1LY<3GLEKLEKEKANREHLEI1EID 

SEG 

PRD hhhhhhhhhhhhhhhcchhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhh 
COILS 



SE<3 VKADKSALATKVSRVl2FDATTE(2LNHi1l1(!ELVAKnSGl2E(2Dld(3KMLDRLLTEHDNKLDRL 

SEG 

PRD hhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhcccchhhhhhhhhhhhhhhhhhhhh 
55 COILS 



SE<2 ELDPVK(3LLEDRklKSLRi2(2LRERPPLYt3ADEAAAMRR(3LLAHFHCLSCDRPLETPVTGHA 
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SEG 

PRD hhhhhhhhhhhhhhhhhhhhhhcccchhhhhhhhhhhhhhhhhhhcccccccccccccee 
COILS 

5 ' * ' " ' ......... 

SE<3 IPVTPAGPGLPGHHSIRPYTVFELEflVRtfHSRNLKLGSAFPRGDLAflflEflSVGRLRSriHS 
SEG 

PRD eeeecccccccccccccccchhhhhhhhhhhhhhcccccccccchhhhhhhhhhhhhhhh 
COILS 

10 

SEfl KMLI1NIEKV(3IHFGGSTKASS(3IIRELLHA(3CLGSPCYKRVTDI1ADYTYSTVPRRCGGSH 

SEG 

PRD hhhhhheeGeeecccccchhhhhhhhhhhhhhcccccceeeccccccceeeccccccccc 
15 COILS 



SEfl TLTYPYHRSRP<2HLPRGLYPTEEI(2IAMKHDEVDILGLDGHIYKGRnDTRLPGILRKDSS 
SEG 

20 PRD ccccccccccccccccccccchhhhhhhhhcceeeeccccceeeecccccccceeecccc 
COILS 



SE<3 GTSKRKSfl<2PRPHVHRPPSLSSNG<2LPSRP<2SAflISAGNTSER 

25 SEG 

PRD cccccccccccccccccccccccccccccccceeeeecccccc 
COILS 



30 

Prosite for DKFZphtes3_lbp3.1 

psoooib isi4B->i.545 rgd PDocDOoib 

35 



(No Pfam data available for DKFZphtes3_lbp3.1) 
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5 group: transmembrane protein 

DKFZphtes3_l?i21 encodes a novel 22M amino acid protein without 
similarity to known proteins- 

10 The novel protein contains 2 transmembrane regions- ESTs can be 
found in testis-i retina and brain- 
No informative BLAST results; No predictive prosite-. pfam or SCOP 
motif e • 

15 The new protein can find application in studying the expression 
profile of testis-specif ic genes and as a new marker for 
testicular cells- 



20 unknown protein 

Pedant: contains signal peptide ( frame 1) and TRANSMEMBRANE 2 

(frame 

2) 

25 perhaps complete cds- 
Sequenced by GBF 
Locus: unknown 

30 

Insert length: lSlfl bp 

Poly A stretch at pos- 1480-1 polyadenylation signal at pos. mst 



35 1 GCCAGACAGC TAGGTGTCAT TCAGGGCTGG TGTCCTCTGT CCAGGCCATC 

51 ATGGCCTCCA CTGCCGGCTA CATCGTCTCC ACCTCCTGCA AGCACATCAT 
101 TGATGACCAA CACTGGCTGT CCTCTGCCTA CACGCAATTT GCTGTGCCCT 
151 ACTTCATCTA CGACATCTAC GCCATGTTCC TCTGTCACTG GCACAAGCAC 
201 CAGGTCAAAG GGCATGGAGG GGACGACGGA GCGGCCAGAG CCCCGGGCAG 

40 251 CACGTGGGCC ATAGCGCGTG GCTACCTGCA CAAGGAGTTC CTCATGGTGC 
301 TCCACCATGC CGCCATGGTG CTGGTGTGCT TCCCACTCTC AGTGGTGTGG 
351 CGACAGGGTA AGGGAGACTT CTTTCTG6GT TGCATGTTGA TGGCAGAGGT 
M01 CAGCACGCCC TTCGTCTGCC TTGGCAAGAT CCTCATCCAG TACAAGCAGC 
MSI AGCACACACT GCTGCACAA6 GTGAACGGGG CCCTGATGCT GCTCAGCTTC 

45 5D1 CTCTGCTGCC GGGTGCTGCT CTTTCCCTAC CTGTACTGGG CCTACGGGCG 
551 CCATGCCGGC CTGCCCCTGC TGGCCGTGCC CCTGGCCATC CCTGCCCACG 
bOl TCAACCTGGG CGCTGCGCTG CTCCTGGCCC CTCAGCTCTA CTGGTTCTTC 
bSl CTCATCTGCC GTGGGGCCTG CCGCCTCTTC TGGCCCCGCT CCCGGCCGCC 
701 CCCGGCCTGC CAGGCCCAGG ACTGAGGCCG GGGGCCGGGA CCCTCCCCCT 

50 751 CCCCACCCCC ACCCCCGTGG AGACAGGGCT CTGGGGCTGA TGGCTGGGGT 
fiOl TGGGAGCCAG GGTCCTCTTG CCCGGACAAC CCCAGGACTG ACGATGACCC 
flSl CGAAAGGGAA GAGGCCCCAT CTCTCGGGGA CTGAGGGGGT GGAGAGAGGG 
101 GACCTCTTCC CCCTACTCTG CCCCCTTCCT GCACACCCTT GCGCTGGAGG 
151 AGGGGAGGGG GCACCGCCTC CCACCCACTG AGGGCAGGAG GGCTTGTGGG 

55 1001 GAGGGACACC AACAGGGTTT CAAGGGGACC AGGAGTCAGA ATGTGGGGAG 
1051 ACGCCTGTGC CAAGGCCATC CCAGCCCCTA TGCTGCCATC CCCCAGGGCT 
1101 CCCCATCACC CGAGAGGAGA GGACGCCCCA ACTAACCCCC GCTGGCCCTC 
1151 GGGCCTCCCG AGTGGCCGGC TGCAACCACG GCTCCTCTCC AGGGTAGGCC 
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1201 AGCTTGAGGA ATCTTATTTA TTTTATTTAT TTACCCAAAT TTGAACTAGT 

1BS1 CTGTTGGGTT GGGGGAAGGA GGTGGCTGCT ACCCCCAAGC CTTCCCAGTG 

13D1 CTGACAACCC CGGGGGCAGG CGAGGGCGCC CAGTCCCTCA CCATCGGCTG 

1351 CACATCGCGC CCTCGGGCCC TGCCATGTCC CTGGTGCTAC TGACCTCTCA 

1MD1 AGGCTTCCTC CAATCTGGGG TCGGGGGACC CTGGGAGGTG CTTTACAGAC 

m51 CGCTAATAAA AGACGATCTG CGTGAACGCC AAAAAAAAAA AAAAAAAAAA 
1501 AAAAAAAAAA AAAAAAAA 



10 BLAST Results 

No BLAST result 



15 



20 



25 



Medline entries 



No Medline entry 



Peptide information for frame 3 

ORF from 51 bp to 725 bp} peptide length: 22M 
Category: putative protein 

Classification: Transmembrane proteins unclassified 

30 1 MASTAGYIVS TSCKHIIDDfl HULSSAYTflF AVPYFIYDIY AMFLCHUHKH 

51 (2VKGHGGDDG AARAPGSTUA IARGYLHKEF LMVLHHAAMV LVCFPLSVVU 

101 RflGKGDFFLG CMLMAEVSTP FVCLGKILIU YK<3<2HTLLHK VNGALMLLSF 

151 LCCRVLLFPY LYWAYGRHAG LPLLAVPLAI PAHVNLGAAL LLAPflLYUFF 

201 LICRGACRLF UPRSRPPPAC <2A<2D 

35 



BLASTP hits 

40 No BLASTP hits available 

Alert BLASTP hits for DKFZphtes3_17i21-. frame 3 
No Alert BLASTP hits found 

45 

Pedant information for DKFZphtes3_17i21i frame 3 



50 



Report for DKFZphtes3_17i21 -3 



CLENGTH3 22M 

EMIiO 25221-11 

Cpll T-03 

55 EHOMOLJ TREMBLNEW : AFlfilbtb_l gene: "BcDNA -GH1232b n i 

product: n BcDNA-GH1232b"i Drosophila melanogaster BcDNA • GHQ23M0 
(BCBNA-GH023M0) mRNA i complete cds- 1e-20 

CBL0CKS3I PR00b32H 
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([BLOCKS! PROMDMA 
CBLOCKSU BLDIE^C 

TRANSMEMBRANE 2 
[Kid] LOU_COHPLEXITY b-25 X 



SEC MASTACYIVSTSCKHIII>Dj2HULSSAYTt2FAVPYFIYl>IYAMFLCHUHKH(2VKGHGGDDS 

SEG 

PRD cccceeeeeccccceeecchhhhhhhhhhheeehhhhhhhhhhhhhhhhhhccccccccc 



10 MEM 



SE(3 AARAPGSTUAIARGYLHKEFLMVLHHAAMVLVCFPLSVVURdGKGDFFLGCHLHAEVSTP 

SEG 

PRD ccccccceeeeecccchhhhhhhhhhhhhhhhcccceeeescccccchhhhhhhhhhccc 

15 MEn rinriririnriririfiMiinnnriri 

SEtJ FVCLGKILI(3YKfll2HTLLHKVNGALMLLSFLCCRVLLFPYLYliJAYGRHAGLPLLAVPLAI 

SEG xxxxxxxxxxxx 

PRD ccchhhhhhhhhhhhhhhhccchhhhhhhhhhhhheeecceeeeccccccccceeeeccc 

20 meh niiririnriririnririiinfinfiri 

SEt2 PAHVNLGAALLLAP<2LYIi)FFLICRGACRLFIilPRSRPPPACl2A<2D 

SEG xx 

PRD cchhhhhhhhhhhccceeeeeecccccccccccccccccccccc 



25 MEM 



(No Prosite data available for DKFZphtes3_17i21-3) 
30 (No Pfam data available for DKFZphtes3_17i21.3) 
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5 group: transcription factors 



DKFZphtes3_lflnm encodes a novel 377 amino acid protein with 
similarity to human giantin- 

10 Giantin is discussed as an autoantigen in rheumatoid arthritis- 
The novel protein contains a leucine zipper and a putative Helix- 
loop-helix DNA-binding domain. Therefore it might be a novel 
transkription factor. Host EST hits are from testis and germ 
cells • 

15 

The new protein can find application in modulation of gene 
expression and in expression profiling. 



20 unknown protein 



see DKFZphtes3_30i23 
wrong orientation 
perhaps complete cds- 

25 

Sequenced by MediGenomix 
Locus: /chromosome="ltj n 



30 Insert length: 52fl2 bp 

Poly A stretch at pos- 52M2-I polyadenylation signal at pos. 5227 



1 CCGGCACCCG GAGCTCCTGG GCACACGGCA TTGGCAGGGG CCGCTTCGGC 

35 51 AGAGTGATGA CTGATGATGA GTCCGAGAGC GTCCTCTCCG ACTCCCATGA 

101 AGGGTCGGAG CTGGAGCTGC CTGTTATCCA GCTGTGCGGG CTGGTGGAGG 

151 AGCTCAGCTA TGTAAACTCT GCTCTCAAAA CTGAGACTGA GATGTTTGAG 

201 AAATATTACG CTAAACTGGA GCCCAGGGAT CAGCGACCTC CACGATTATC 

251 AGAAATTAAA ATATCAGCAG CAGATTATGC ACAGTTTCGA GGCAGGCGTA 

40 301 GATCCAAATC CCGGACAGGT ATGGACCGTG GGGTAGGCCT GACTGCCGAC 

351 CAAAAACTTG AGCTGGTACA AAAAGAGGTT GCGGACATGA AGGATGACTT 

M01 ACGACACACA AGGGCAAATG CGGAACGCGA CCTGCAGCAT CACGAGGCGA 

M51 TCATTGAGGA GGCTGAAATT CGATGGAGTG AAGTTTCGAG AGAAGTGCAT 

5D1 GAGTTTGAAA AAGATATTCT AAAAGCCATA TCCAAGAAGA AAGGGAGTAT 

45 551 TTTGGGCACT CAGAAAGTGA TGAAATACAT TGAGGACATG AACCGCCGGA 

tOl GGGATAATAT GAAGGAGAAA TTACGTTTGA AAAATGTTTC TCTCAAAGTT 

b51 CAGAGGAAAA AAATGCTTTT ACAATTGAGG CAGAAGGAAG AGGTGAGTGA 

701 GGCCCTTCAC GATGTTGATT TTCAGCAGTT GAAGATAGAG AACGCTCAAT 

751 TTCTTGAGAC AATTGAAGCA AGGAATCAAG AACTGACCCA GCTAAAGCTG 

50 fiOl TCATCTGGAA ACACTCTGCA G6TTCTCAAT GCCTACAAAA GCAAGCTTCA 

651 CAAGGCAATG GAAATATACC TCAATCTGGA CAAGGAGATC TTGCTGAGAA 

101 AAGAGCTACT TGAAAAAATT GAAAAAGAAA CACTACAAGT AGAGGAGGAC 

151 CGGGCCAAAG CCGAGGCAGT GAATAAGAGG CTCCGGAAGC AGCTGGCCGA 

1001 GTTCCGGGCA CCACAGGTGA TGACTTACGT CCGGGAGAAG ATCTTAAATG 

55 1051 CGGACCTGGA GAAGAGCATC AGGATGTGGG AAAGGAAAGT GGAGATAGCA 

1101 GAGATGTCCT TAAAAGGCCA TCGTAAGGCT TGGAATCGAA TGAAAATAAC 

1151 CAATGAGCAG TTGCAGGCAG ATTACCTTGC TGGGAAGTAG CCAGAGGCAG 

1201 GCCACGGCTT ACAGACCACT ACATGACCTA TAAAAGTAAT CAGCTCCTTT 



-328- 



WO 01/98454 



PCT/1B01/02050 



1251 CTAGTCACGG GCTCCTCTCA 

13D1 CCCCACCCAG GCTGAGTATC 

1351 TGTTTTCACA GCCTGGCCCC 

1401 GCACTCTAAT GGTTTGACAC 

5 IMSl GTGAAAGTGA CCTTCCCACA 

1SD1 GCCCAGTGTA TGCCATGGGC 

1551 CAAAAACAGA CATCAAAACA 

1L01 CAGTTTACTC TTCAGTTTGG 

IbSl AAAAGAAAAC AAGCACGAAG 

10 17D1 AACCTGTTCT GACCTGCAAA 

1751 ATAGTCATAG TATAAGGGTT 

IflOl CCTTTACATC CTTCTCCCTA 

IflSl AGTCGTGAAT GCGTGATGGT 

1=101 ATGGCCCAGT TCAGAATCAG 

15 1151 GTCCATGGTG GGAGAAAGAA 

2001 AGGGGAGGGC CGGCCCTCTC 

2051 TTCCACCTTG GTGTACAGAA 

S101 TCAGGACTCC TCTCAGAACC 

H151 ATTGGGCTTC CTGGGTCCCC 

20 5S01 TAGAAAAACT AGTTTTTGTT 

2251 GTTCTAGAAA TGTTTTACGC 

2301 GGATACAATT TCAAATCTAG 

2351 ACTTTTCCTT TTATTATGCA 

2401 TAGCTTCCTG GTTCATATTT 

25 2451 CAAACCTTCC CTCTCTTCTG 

2501 TTCTGTTTTG TCCTGAAATG 

2551 ACCCTTAGTG TAAGCCACTT 

2b01 GCTGGCCGTG CTCTGGGTCT 

2bSl ACTTGGGAAT TCTGCCACAT 

30 2701 GAAGCTTGGC ATCATTAGCT 

2751 CTCTCTGAGC CAAGGTGGGA 

2301 CACAGCACAG TGCCTGGCAC 

2851 CATCATCCTG ATGTCGCTAT 

2101 AAAATGGGAA AAGCAACAAG 

35 2151 CCCAGACCCC ACACCCCTAG 

3001 CATGGCCCAT CTTGGTCCGA 

3051 CCACTTCTGA CCTGTGTGGC 

3101 TGTTGCTCTC GCAGTTTGGA 

3151 ACAGGAAACA ATCTGAACAG 

40 3201 AGCAGCCGCT TCAGCCCCTT 

3251 TTGGTCACTC TCTCTGTCTC 

3301 GCTGGAGTGC AGTG6TGCAA 

3351 ATTCAAGCAA TTCTCCCACC 

3401 ATGCGCCACC ATGCCCAGCT 

45 3451 CTTGCTCTGT CCCCCATGCT 

3501 CAACCTCCTC CTCCTGGGTT 

3551 TAGCTGGGAT TACAGGCGCC 

3t01 AGTAGAGATG GGGTTTCACC 

3b51 CCTCATGATC CACCCGCCTC 

50 3701 GAGCCACTGC ACCCGGCCTA 

3751 CACGATGTTG GCCAGGCTGG 

3A01 CTCGCCCTCT CAAAGTGCTG 

3651 GGTATTCTCT TTCAATAAAG 

3=101 CAGAGATGAA GACCAGTGGG 

55 3151 AGCGGGGAGG CCATGCTGCA 

4001 GCAGGCCAAG GCCAGACATG 

4051 CCTCTCCTTG GATGGAAGGG 

4101 AATTGACTGG TGAAGAGGCC 



CTGTTCCCTG TCTGCCTGGT GTTCCCAACC 
ATCTCCTGGG CCACATCTGC CCATGGGGAG 
TGGAACTGTT ACCACTGAAA GAACCACAGG 
TTGTTAGCCA GCATTTAGTT CACAAGCATA 
CCTGGGAGAG GGATAGAGGA GGGAGAGCCA 
TTATCCGTGG CAGCCCCAGT GTGCAACTAT 
GCATGGTGAA TGCCTGGCAC TCAGCATTCT 
TGGGGTAGCT CCTGGACTAG ATACTGCTGC 
GAAACCAAGA TGATTTCTTC GGGCTGATAC 
AATCCTACCT TCCCCCACCT CCCCACCGTA 
GTACAGACGC CTCAGGAGAC CTGCCTGATT 
ACATCTAGAC TATCTCTAGA GCTGTTTCCT 
CCTTCTTTGT CCCTGCAAGT ATGATCCAAC 
AATATGTCTT CTGTGTCATG GTGGCATTTG 
ATCAACTTTT CCCAGTGGTG GAGTGAGGAC 
AGCCTTGGAT GTGATCCATT TGCTGTAGTC 
ACAGGCCAGG GCACGTCTCA CCACCGAAGT 
CACAGATCGA ACTGCTGTAG CTGGCACATC 
CTGTGATAAA AGACAGAAGG CTTCAAGTCT 
GTAAATCTAT CCTTGTGCAA TATACTGTTT 
TGGTTCTCAC TGGAAATGGG GCAAATTATA 
GCAGCCACCA CCACAAATTC CAACAAGATG 
AATTAGCTGT GGACTTCTGC TGATTGCCTA 
CATTTTCTTG CCCCTTTCCA GTCCTTTGGC 
GCTTCTCATT CCTGAAATGT TGGTGTTTGT 
CTCACATTTT CCCTTCTCTG CCTTGCTTCA 
CCTGCCACCT GGCAACTGCT TACCAGCCTG 
TCCCTACTCC CAATGGAGCA GTCCTCTGGG 
ACACTTTATC TAACTTAAAG TGACGGAGTA 
AGATATGGGA CCCTGGCAAG TGACCAAATC 
ACACAGTTAA TGCCTGTAAC ACGTGCTGAG 
ACAGCAAACA CTCAATAGAA TATTAGCTAC 
AAAGGCCAGC ATTTTTCTGA AAAGTTGGGG 
GCAACTAGTA GGTATCACTT ACCTTACCTG 
GTCTCCTCTC AAAGGAATTC CTGCCCCTCC 
GAAGGGGGTG GTCATCCCCA GGCTAGCCAG 
CTGCCTGGCT 6GAAGGCCCA GGCAATGACA 
CTGAGACATG GAATGGGGCC GCAATTAACA 
ACTGAACCAC GAGCAGCAGA AAGGCAGAAG 
ACCATCCGAG ACCTGGGTGT GTGGTCTGTC 
TCTTTCTCTC TTTCTTTCTC TGTCCCCAAG 
TCTTGGCTCA CTGCAACCTC CACCTCTGGG 
TCAGCCTCTC GAGTAGCTGG GGCTACAGCT 
AATTTTTTTT T JTTTTTTTT GAGATGGAGT 
GGAGTGCAGT GGCATGATCT CGGCTCGCTG 
CAAGCGATTC TCCTACCTCA GCCTCCCCAG 
CACCACCACA CCTGGCTAAT TTTTATTTTT 
ATGTTGGCCA GGCTGGTCTC GAACTCCTGA 
GGCCTCCCCA AGTGTTGGGA TTACAGGCGT 
ATTTCTGTAT TTTTAGTAGA GATGGGGTTT 
TCTTAATCTA ACTTCAAGTG ATCTGCCCGC 
GGATTAGGCA TGAACTACCA TGCCCAGTGG 
CTCCTCTTTT CCAAGGAAGC CACACCAGAA 
AAAACATGGG AGCAACTCCG TGGGCAGGCC 
AAGCTGCCGT GATTCCCTGG TGATCTCTCA 
TGAGGAAGGC CTTGAGGACT TCATTCTGTG 
GGTGCTTTAG TGTGGCACTC CTGACTTTTC 
CTTGTGTGCA CCTCACTATG TCT6CCTAGG 
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41S1 TCATGGGGGC 
4201 CGATTCGCAG 
4251 AATGTGATTT 
4301 CTATGGCGTG 
5 4351 TACGCTTTTT 
4401 ACAAGAGGCA 
44S1 CTCATCTCCC 
4501 TGATCTTTTC 
4551 GGCAGGATGG 

10 4L01 AACCCATAGG 
4b51 GTGGCTTTGA 
4701 TCTATGGGGA' 
4751 AAAATGAAGG 
4fi01 TTTGCAGTTA 

15 4flSl ATATTCCATT 
• 4^01 GGGTGTGTCT 
4151 TTCTTTCTTG 
5001 TCTTAGATCT 
. 5051 GAATTTTGTT 

20 5101 TTTAAGAAAT 
5151 AGGCACTCAA 
5201 AATGAGAGGC 
5251 AAAAAAAAAA 



TCCCTGGCCA 
TTTGTCTTAA 
AGGACAAATG 
GTTTTGCAGG 
CTCTTCCCCA 
GAGGCGAACA 
AAGGGAGCGG 
CCTTGACATT 
GTGAGTGCAG 
CAGATTCTGA 
AGACCTCTGG 
CG5GTCACCA 
CATGTTTAGA 
GATTTTAAAA 
GAGTCTAAGA 
TTTAATCAGT 
GGATTACAAA 
GAGGAAGTAT 
AAAAAGCTAT 
TGTTAAGATC 
TAGATGTTAC 
AGTCCTTCAT 
AAAAAAAAAA 



AGAATGACGT 
CTGTAGTGGT 
ATTGGATGAG 
TCACTGTTCC 
TAATCCCGTA 
GCTCCAGGTG 
CCACAGCCCA 
CAGCAAAAGC 
AGTGATTCTG 
ACCTGGTGGT 
ACATGAGAAC 
TTAAATGGTG 
GGTGTGTCAC 
GATGGTCAGT 
TACAGTTAGA 
TGATGTCAGA 
AAATGATGGT 
GATACTTGTT 
ATCTTCACTG 
CCCCACCTGG 
ACCAACTTTG 
GTTTTGCAAT 
AAAAAAAAAA 



GGTTCCCCCT 
ATAGCCAGAG 
TGATTGGTAG 
ACCCACCTGG 
GGGGCTGCGA 
CCCCTCTGGA 
GAGTGGGGTC 
CCTGACAGTG 
CTTTTGTTGG 
TGATTCTACA 
ATATTTCCAA 
TGCAAGCATA 
AGTTAAAAAC 
TAGAGTAGAA 
AATCAACATC 
GTTTAACGGG 
GCATTCTATA 
TGACGGAATG 
TATTTTAACA 
CAGAGGACCC 
GAAGGGCAAA 
AAAATGACTT 
AA 



PCT/1B01/02050 

TTCATCAGTC 
CAAGAAAAAG 
ATGTCCTCAG 
GCACAGCATA 
CTTCTGAAGC 
GCTACCCTAC 
TTTCATTTTG 
GTAGAATAAA 
GTTTCAGGGA 
TGTGGGAATT 
GACAGAGGAT 
ATTCTGTTCA 
CAACCTGAAC 
ATAGCTTAGA 
TTTGAAATTA 
CAGCATTTTT 
ATTGGCAGCA 
GTTGACGGCA 
CATTATCTAA 
AGTACAAAAT 
CATATTTCTT 
TTAAAAAAAA 



25 



30 



BLAST Results 



No BLAST result 



35 



Medline entries 



No Medline entry 



Peptide information for frame 3 
40 

ORF from 57 bp to 1167 bpi peptide length: 377 
Category: putative protein 
Classification: no clue 
45 Prosite motifs: LEUCINE_ZIPPER <n-40) 



1 MTDDESESVL SDSHEGSELE LPVIULCGLV EELSYVNSAL KTETEMFEKY 

51 YAKLEPRDflR PPRLSEIKIS AADYAC2FRGR RRSKSRTGMD RGVGLTADdK 

50 101 LELVflKEVAD MKDDLRHTRA NAERDLflHHE AIIEEAEIRbl SEVSREVHEF 

151 EKDILKAISK KKGSILATfiK VMKYIEDMNR RRDNMKEKLR LKNvSLKV(2R 

201 KKMLLflLRUK EEVSEALHDV DFfiflLKIENA flFLETIEARN <2ELT(2LKLSS 

251 GNTLflVLNAY KSKLHKAMEI YLNLDKEILL RKELLEKIEK ETLfiVEEDRA 

301 KAEAVNKRLR K(3LAEFRAP(3 VMTYVREKIL NADLEKSIRM UERKVEIAEM 

55 351 SLKGHRKAUN RMKITNE(2L(2 ADYLAGK 
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BLASTP hits 

No BLASTP hits available 
5 Alert BLASTP hits for DKFZphtes3_ianm frame 3 

No Alert BLASTP hits found 

Pedant information for DKFZphtes3_lflnmi frame 3 

10 - - --- 

Report for DKFZphtes3_lflnm • 3 



15 ILENGTH3 3=15 

OIIO MblST-lb 
CpI3 

EH0I10L3 TREMBL:AF13b711_l product: "myosin heavy chain"; 

Amoeba proteus myosin heavy chain mRNAi complete cds. Se-Db 
20 CFUNCAT3 11 unclassified proteins CS. cerevisiae-. YORBlfacJ 

7e-0"4 

CBL0CKS3 BLD0Sti3B Stathmin family proteins 
[[BLOCKS! PROD^ISD 
EPR0SITE3 LEUCINE_ZIPPER 1 
25 CPFAM3 Helix-loop-helix DNA-binding domain 

CKU3 All_Alpha 

OCIO L0U_C0PlPLEXITY b-33 '/. 

CKU3 C0ILEI>_C0IL m-L-5 '/. 

30 

SE(3 GTRSSlilAHGIGRGRFGRVMTDDESESVLSDSHEGSELELPv'IfiLCGLVEELSYv'NSALKT 

SEG • 

PRD cccccccccccccceeeeeccccceeeeeccccccceeeeeeeeccchhhhhhhhhhhhh 
COILS 

35 

SE<2 ETEHFEKYYAKLEPRDti2RPPRLSEIKISAADYA<2FRGRRRSKSRTGNDRGVGLTADflKLE 

SEG * 

PRD hhhhhhhhhhhhcccccchhhhhhhhhhhhhhhhhhccchhhhhccccccccchhhhhhh 
40 COILS 



SE<2 LVflKEVADMKDDLRHTRANAERDLflHHEAIIEEAEIRUSEVSREVHEFEKDILKAISKKK 

SEG xxxxxxxxx 

45 PRD hhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhc 
COILS 

CCCCCCCCCCCCCCCCCCCCCCCCCCCCCC 

SE<2 GSILAT(3<VHKYIEDnNRRRDNI1KEICLRLKNVSLKV(3RKKMLL(JLR(3KEEVSEALHDVDF 

50 SEG 

PRD ccchhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhh 
COILS 



55 SEfl flflLKIENAflFLETIEARNCJELTtJLKLSSGNTLflVLNAYKSKLHKAIIEIYLNLDKEILLRK 

SEG xxxxxxx 

PRD hhhhhhhhhhhhhhhhhhhhhhhhhhcccccccchhhhhhhhhhhhhhhhhhhhhhhhhh 
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COILS 

c 

SEfi ELLEKIEKETLt2VEEDRAKAEAVNKRLRK(2LAEFRAP(2VnTYVREKILNADLEKSIRnb)E 

5 SEG xxxxxxxxx 

PRD hhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhh 
COILS 

CCCCCCCCCCCCCCCCCCCCCCCCCCC 

10 SE12 RKVEIAENSLKGHRKAlilNRnKITNEflLflADYLAGK 

SEG - 

PRD hhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhccc 
COILS 



15 



20 



25 



Prosite for DKFZphtes3_lflnm . 3 
PSOODai 37->51 LEUCINE_ZIPPER PD0C0D02 C 1 

Pfam for DKFZphtes3_lflnm .3 
Hf1l1_NAHE Helix-loop-helix DNA-binding domain 



*RRRNHNnRERRRRndINNUFeaLR])HIPHhnV. . . PNEKPLSKVEILRn 
30 RRR Nn E+ R++++ + + ++++ +E L V+ 

++ 

fluery llfi RRR-DNnKEKLRLKI\IVSLKV<2RfCKMLLflL-R(2KEEVSEA- 

LHDVDFflflL SM3 

35 Hnn AIEYIrsLfl* 

IE ++L+ 

fluery 2MM KIENAflFLE 252 
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5 group: testis derived 

DKFZphtes3_11pl2 encodes a novel btM amino acid protein without 
similarity to known proteins- 



10 No informative BLAST results; No predictive prosite-i pfam or SCOP 
motif e • 



The new protein can find application in studying the expression 
profile of testis-specif ic genes. 

15 



unknown protein 



Sequenced by MediGenomix 

20 

Locus: unknown 



Insert length: Elbl bp 

Poly A stretch at pos- 20fltn no polyadenylation signal found 

25 



1 
51 
101 

30 151 
EDI 
E51 
301 
351 

35 HQ1 
M51 
501 
551 
bOl 

40 fa51 
701 
751 

aoi 

651 

45 101 
151 
1001 
1051 
1101 

50 1151 
1201 
1S51 
1301 
1351 

55 1M01 

msi 

15D1 
1551 



CCCGAGCCAG 
CTGGGCACCG 
CCGAGCGTCC 
GAAGTCTGAC 
ACTGGCGTCC 
GAAGAGGCCA 
CTCCTGTCTA 
AAGAAGTCAT 
GCTCCGGCGC 
AGCTCCTGGA 
AAAAGGCCCG 
GAAGCTGGAA 
AGACCGATAT 
ACATACTACG 
AACCACCGGA 
AGAAGAAGAT 
CTCACGGAAG 
CACCTCCCCA 
CCCGGCTGCT 
GAGAGCTCAA 
CTGCCTTGCA 
ACAAGGACCA 
CGGACCGCGC 
GCTCCTGCAG 
AGGGCGAGGA 
CAGACACTTA 
GAAAGAGGAT 
CCACTCCCAG 
GAGGAGGGGC 
CGCCGCGGCC 
AAAAAAAGGC 
AGGGGACATC 



CAACCCTGAG 
GGGAGCCGGC 
CTGACCCCGC 
CCAGGCCCTG 
CCGGCGGCAC 
CATCTCAGGC 
CAGAGAAAAA 
TGCACGTGCA 
CTGGAGGAGG 
TCCCAGCCGC 
ATGCCAGTTG 
CAGCAGTGCA 
GAAGACTACC 
AGGAGGTGCA 
AAGAAGCCCC 
GGGCAGTGCC 
AGAACCAGAG 
ACCATCTCCA 
GAGGCGCATT 
AATCACACGC 
TCCAGCTCTG 
CGAGCGTCTC 
TGCAGGAGCA 
GCGAAGGCCG 
GGAGAGGAGA 
CCAGCAAGCT 
TGCCCGGAAG 
CAGCAGGCAC 
TCCCGCGGCC 
AGAGTCCTGC 
TGTTCTGGAT 
TCACGCGGAC 



GGGCGGCCGG 
CTTGGACACG 
AGAAGCTGTG 
AACTCACCCC 
TCCTGACTGT 
GCTCTGCCAG 
GAAGATATGT 
GAAGAGCGAC 
AAAACAGCAG 
GGCACGGATT 
GGTCATTAAC 
AGGAGAAGGA 
AACCTGGAAG 
TCGTCTCCAG 
TGGGGGAGAA 
CTCCTGAGCT 
CCTGAAGGAG 
AGACACAGGG 
GTGGAGCTGG 
CGCAGAGCCA 
CGCTGCACAG 
CGAGGGGCTG 
GCTGCTGCAG 
ACCTGGAGAA 
GAGCGAGAGG 
CCAAGAATT6 
TTCCTCATAA 
TGCGAGCAAG 
CCGCTCCCCC 
AGGCCCAGTG 
GAGGCGGCTG 
AAAGCTCTTA 



GCAGCGCCGC 
AGTCACCTTA 
GCTGGGAACC 
TCACCTGGGA 
CTGACAGACA 
CAACGGTCAT 
ATGAC6AGAT 
GTGGACCTGA 
GAAGGACCGG 
TTGTTCGGAC 
GGGCTGAAGC 
CGGCACCATC 
AGATGCGGAT 
ACCCTCTTGG 
GAAGACGGGC 
TGTCCCGGAG 
GACCTGGACC 
TTATGTGGAG 
AGAAGAAACT 
GTCAGATCAC 
ACAGCCACGA 
TGAGAGACCT 
AGAGATTTGG 
GGAGCTGGAG 
AGGTTTTGAG 
CAAGAAATGA 
GGCCCAAGAG 
ACTGGCCGCC 
TGCTCTGATG 
GAAGGTGTAC 
TGGTGCTTCA 
GCAAGCAAAG 



CACCATGTTC 
TCTCTCTAAG 
GCAAAGCCAG 
GCATGCGTGG 
CCTTCAGAGT 
GTCCCTGGGA 
TATTGAGTTA 
TGAGAACGAA 
CAGATAGAGC 
TCTGGCAGAG 
AGAGGATCCT 
AGCAAACTCC 
CGCCATGGAG 
CAAGTTCTGA 
GCCAAAAGGC 
TGTCCAGGAG 
GCGTGCTGAG 
TGGAGCAAGC 
AAGTGTGATG 
ACCCGCCAGC 
GGGGACCGCA 
GAAGGAAGAG 
AGGTGAAGCA 
TGCGCGAGGG 
AGAGGAGATT 
AGAAAGAAGA 
CTCCCAGCTC 
GGATTCCAGC 
GGAGAAGAGA 
AAGCACAAGA 
GGCAGCTTTC 
CACATGGCTC 
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lbOl AGAGCCACCC AGCGTGCCAG 
IbSl GCGTTCCGAG CCCCATCGCC 
1701 GCCATCGTCA TCATCCAGTC 
17S1 GCACAGTGCT ACCGGTAAAA 
5 lflDl GATCGGCTTC AGCCACACAC 
IfiSl GCTCTTCCTG ACCCCTCTCC 
nOl TGGGGATGAC GTCAACTCCG 
1151 CTCTGCCCAC GAAGAACTTT 
2001 CCGTGATGGC AGCGCTGCCG 
10 2051 TTTATCGTGT TAGGAGAAGA 
2101 AAAAAAAAAA AAAAAAAAAA 
2151 AAAAAAAAAA A 



15 



GCCTCCCAGA CCAGAGCTCT CCTGTGCCCC 
CAGGCCACGG GCAGCCCTGT GCAGGAGGAG 
CGCTCTGCGG GCACACCTGG CCCGGGCCAG 
GAACCACCAC CGCAGCTTCT ACCAGGAGGA 
GGGGACGCCT CCTCCCCACC CTTCCTCGCA 
CTCAGGGCCA CAGGCCTTGG CACCTCTACC 
ATGATTCCGA CGATATTGTC ATTGCACCGT 
CCAGTTTAGG TCCCCGTCAC TGTCTCCACG 
AGGACATAGG AACCACGACT GGAAAGATAA 
ACGATGATAC CTACTTAAAA AAAAAAAAAA 
AAAAAAAAAA AAAAAAAAAA AAAAAAAAAA 



BLAST Results 



No BLAST result 

20 



Medline entries 



No Medline entry 

25 



Peptide information for frame 3 



30 

ORF from 45 bp to 117b bp; peptide length: k44 
Category: similarity to unknown protein 
Classification: unclassified 
Prosite motifs: RGD (332-334) 

35 

1 MFLGTGEPAL DTSHLISLSR ASLTPflKLlilL GTAKPGSLTU ALNSPLTWEH 
51 AUTGVPGGTP DCLTDTFRVK RPHLRRSASN GHVPGTPVYR EKEDMYDEII 
101 ELKKSLHVtJK SDVDLMRTKL RRLEEENSRK DR(JIE(3LL»P SRGTDFVRTL 

40 151 AEKRPDASWV INGLKflRILK LEfiCCKEKDG TISKLflTDMK TTNLEEMRIA 
201 METYYEEVHR LflTLLASSET TGKKPLGEKK TGAKRtSKKMG SALLSLSRSV 
251 (2ELTEEN(2SL KEDLDRVLST SPTISKTflGY VEUSKPRLLR RIVELEKKLS 
301 VMESSKSHAA EPVRSHPPAC LASSSALHRfl PRGDRNKDHE RLRGAVRDLK 
351 EERTALUEflL LGRDLEVKflL LflAKADLEKE LECAREGEEE RREREEVLRE 

45 HOI EI(3TLTSKL(J ELcJEMKKEEK EKPEVPHKA (2ELPAPTPSS RHCEflDhlPPD 
MSI SSEEGLPRPR SPCSDGRRDA AARVLflAflbJK VYKHKKKKAV LDEAAVVLUA 
501 AFRGHLTRTK LLASKAHGSE PPSVPGLPDfi SSPVPRVPSP IA<2ATGSPV(2 
551 EEAIVIIdJSA LRAHLARARH SATGKRTTTA ASTRRRSASA THGDASSPPF 
bOl LAALPDPSPS GPflALAPLPG DDVNSDDSDD IVIAPSLPTK NFPV 

50 



BLASTP hits 

55 No BLASTP hits available 

Alert BLASTP hits for DKFZphtes3_npl2i frame 3 
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No Alert BLASTP hits found 
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Pedant information for DKFZphtes3_l c lpl2i frame 3 
Report for DKFZphtes3_npl2 . 3 



10 



15 



20 



25 



30 



35 



40 



45 



50 



55 



CLENGTH3 bMM 

EMIiO ?iaiQ.m 

EpI] fi-fiO 

IH0I10L3 TREMBL : AB02fl c mb_l gene: n KIAA1023 n i product: 

"KIAA1023 protein n i Homo sapiens mRNA for KIAA1023 protein-i 
partial cds- □•□ 

EFUNCATJ 3D. 03 organization of cytoplasm ES. cerevisiaei 
YDLDSflwl 2e-D7 

EFUNCAT3 DB-07 vesicular transport (golgi network-i etc) ES- 
cerevisiae-i YDLOSBwJ 2e-07 

EFUNCATJ 11 unclassified proteins ES. cerevisiaei YLRSOIcJ 

3e-0b 

EFUNCATJ 3D-D 1 ! organization of cytoskeleton ES. cerevisiaei 

YDR3Sbu3 2e-0S 

CFUNCAT3 OT-IO nuclear biogenesis ES- cerevisiaei YDR35bw3 

2e-Q5 

EFUNCATJ 03-22 cell cycle control and mitosis ES. cerevisiaei 



ES. cerevisiaei 



03-22 cell cycle control and mitosis 

YDR3Sbw3 2e-DS 

EFUNCAT3 Ifl classification not yet clear-cut 

YJR13Mc3 Me-05 

iblocks3 prioissm 

[BLOCKS] BLO0b27B GHHP kinases ATP-binding domain proteins 

EBLOCKSIl BL0032bC Tropomyosins proteins 

EBLOCKSH BLOllbOB Kinesin light chain repeat proteins 

EBLOCKSIl BLD0B20D Glucoamylase proteins region proteins 

EBL0CKS3 BP04m?C 

EBL0CKS3 BL00412B Neuromodulin (GAP-43) proteins 

EEC3 3- b. 1.32 Myosin ATPase 3e-0fl 

EPIRKIO tandem repeat 3e-0fi 

EPIRKIO transmembrane protein 2e-07 

EPIRKIO muscle contraction 3e-0B 

EPIRKIO actin binding 3e-Dfi 

EPIRKIO ATP 3e-0fi 

EPIRKIO thick filament 3e-DB 

EPIRKIO alternative splicing 7e-07 

EPIRKIO coiled coil 3e-0S 

EPIRKIO P-loop 3e-06 

EPIRKIO heptad repeat 2e-D7 

EPIRKIO methylated amino acid 3e-DB 

EPIRKIO hydrolase 3e-0a 

EPIRKIO Golgi apparatus 2e-D7 

ESUPFAM3 myosin heavy chain 3e-0fl 

ESUPFAM3 myosin motor domain homology 3e-DB 

ESUPFACO alpha-actinin actin-binding domain homology ae-Db 

ESUPFAH3 plectin Be-Db 

ESUPFAM3 ribosomal protein SID homology Be-Db 

ESUPFAIU giantin 2e-D7 
EPR0SITE3 RGD 1 

EKIO All_Alpha 

EKIO L0U_C0nPLEXITY 14- bO '/. 
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SEfl I1FLGTGEPALDTSHLISLSRASLTP<2KLIi)LGTAKPGSLTflALNSPLTlilEHAIiJTGVPGGTP 

5 SEG 

PRD cccccccccccceeeeeeeecccccceeeeecccccceeeeccccccccccccccccccc 
COILS 



10 SEfl DCLTDTFRVKRPHLRRSASNGHVPGTPVYREKEBMYDEIIELKKSLHVflKSDVDLriRTKL 

SEG 

PRD cccccchhhhhhhhhhhhhhccccchhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhh 
COILS 

CCCCCCCCCCCCCCCCCCCCCCC 

15 

SEA RRLEEENSRKDR<2IEl2LLDPSRGTDFVRTLAEKRPDASli)VINGLKt2RILKLE(2flCKEKDG 

SEG 

PRI> hhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhh 
COILS 

20 CCCCCC 

SEiJ TISKL(2TI>I1KTTNLEEI1RIAf1ETYYEEVHRL(3TLLASSETTGKKPLGEKKTGAKRi3KKMG 

SEG 

PRI> hhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhh 
25 COILS 

cccc 

SE(J SALLSLSRSV(2ELTEEN(2SLKE1>LDRVLSTSPTISKT(3GYVEUSKPRLLRRIVELEKKLS 

SEG 

30 PRD hhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhh 
COILS 

CCCCCCCCCCCCCCCCCCCCCCCCCCCCCC 

SE<2 VI1ESSKSHAAEPVRSHPPACLASSSALHRflPRGDRNKDHERLRGAVRDLKEERTAL(3E(JL 

35 SEG 

PRD hhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhh 
COILS 



40 SE<2 L(3RDLEVKi3LLi3AKADLEKELECAREGEEERREREEVLREEI(3TLTSK:Li8EL(3EI1KKEEK 

SEG xxxxxxxxxxxxxxxxxxxxxxxxx xxxxxxxxxxxxx 

PRD hhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhh 
COILS 

CCCCCCCCCCCCCCCCCCCCCCCCCCCCC 

45 

SEU EDCPEVPHKAQELPAPTPSSRHCEfiDUPPDSSEEGLPRPRSPCSDGRRDAAARVLflAflUK 

SEG x x 

PRD hhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhh 
COILS 

50 CCCCCC 

SEfi VYKHKKKKAVLDEAAVVLUAAFRGHLTRTKLLASKAHGSEPPSVPGLPDflSSPVPRVPSP 

SEG xxxxxxxx 

PRD hhhhhhhhhhhhhhhhhhhhhhhcchhhhhhhhhhhcccccccccccccccccccccccc 
55 COILS 



SE(J IAflATGSPVdJEEAIVIIflSALRAHLARARHSATGKRTTTAASTRRRSASATHGDASSPPF 
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SEC xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx 

PRD ccccccccccceeeehhhhhhhhhhhhhhhhccccceeehhhhhhhhhccccccccccce 
COILS 



5 

SE<3 LAALPDPSPSGPflALAPLPGDDVNSDDSDDIVIAPSLPTKNFPV 

SEG xxxxxxxxxxxxx 

PRD eeecccccccccccccccccccccccccceeeeecccccccccc 
COILS , 

10 



Prosite for DKFZphtes3_nplS . 3 
15 PSQDDlb 335->335 RGI> PDOCDOOlb 



(No Pfam data available for DKFZphtesS.nplE-S) 
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5 group: transmembrane protein 

DKFZphtes3_20hl2 encodes a novel 1204 amino acid protein without 
similarity to known proteins- 

10 The novel protein contains 1 transmembrane region and two leucine 
zippers. 

No informative BLAST results^ No predictive prosite-i pfam or SCOP 
motife- 

15 The new protein can find application in studying the expression 
profile of testis-specif ic genes and as a new marker for 
testicular cells- 



20 putative protein 

perhaps complete cds- 
Pedant: TRANSMEMBRANE 1 

25 Sequenced by MediGenomix 

Locus: unknown 

Insert length: Sflm bp 
30 Poly A stretch at pos- SA74-, no polyadenylation signal found 



1 CTCTGCCTTT CCTCTCGCAG CCACCCTTCC TCTCAGACCA GTACGGTGGC 

51 CGACGGGAGT CAGACGCTGG GGATGAATGA AGGATCAACA AACAGTAATA 

35 101 ATGACTGAAT GTACAAGTCT TCAGTTTGTC AGCCCTTTTG CTTTTGAGGC 
151 AATGCAGAAG GTGGATGTTG TTTGCCTGGC ATCTTTAAGT GATCCAGAAT 

201 TAAGACTTCT TCTGCCCTGT TTGGTACGGA TGGCACTTTG TGCACCTGCT 

251 GACCAGAGCC AAAGCTGGGC TCAGGATAAG AAACTCATCC TTCGCCTTCT 

301 TTCTGGAGTG GAAGCTGTCA ACTCCATTGT TGCATTGTTG TCCGTGGACT 

40 351 TTCATGCTTT AGAACAAGAT GCCAGCAAAG AACAGCAGCT TAGGCATAAA 

401 CTTGGAGGAG GCAGTGGAGA GAGCATCCT6 GTATCACAGC TTCAGCATGG 

451 ACTGACGTTA GAGTTTGAAC ACAGTGATTC ACCTCGTCGA TTGCGTCTTG 

501 TGCTTAGTGA ACTGTTGGCA ATTATGAACA AGGTGTCTGA GTCCAACGGA 

551 GAATTTTTTT TCAAGTCTTC TGAACTTTTT GAGAGTCCAG TATATTTGGA 

45 fc.01 GGAAGCTGCA GATGTACTTT GTATTTTACA AGCAGAGCTC CCTTCCTTGC 

bSl TCCCTATAGT TGATGTAGCT GAAGCTTTGC TACATGTTAG AAATGGTGCC 

701 TGGTTCTTGT GTCTCTTGGT GGCCAATGTT CCTGATAGTT TTAATGAAGT 

751 TTGTAGGGGC CTGATAAAAA ATGGAGAACG ACAAGATGAA GAAAGTCTTG 

AQ1 GAGGAAGGCG CAGGACAGAT GCCTTACGCT TCTTGTGTAA AATGAATCCT 

50 SSI TCTCAGGCCC TCAAGGTCCG AGGCATG6TG GTGGAAGAAT GTCACTTGCC 

101 AGGCCTTGGT GTGGCTTTGA CATTGGATCA TACTAAAAAT GAAGCTTGTG 

=151 AGGATGGAGT GAGTGACTTG GTTTGTTTTG TAAGTGGTTT GCTTCTTGGA 

1001 ACAAATGC6A AAGTCCGGAC TTGGTTTGGA ACTTTTATCC GAAATGGACA 

1051 GCAGAGAAAA AGAGAGACCA GCAGTTCTGT CCTTTGGCAG ATGAGAAGGC 

55 1101 AGCTTCTTCT GGAGTTGATG GGCATTCTTC CCACAGTAAG AAGCACCCGA 

1151 ATTGTGGAAG AAGCTGATGT GGATATGGAG CCCAATGTGT CTGTGTATTC 

1201 GGGGCTGAAA GAAGAGCAT6 TTGTGAAAGC CAGTGCACTC TTACGTCTGT 

1251 ACTGTGCTTT GATGGGGATC GCTGGACTCA AACCAACTGA AGAAGAAGCT 
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13D1 
1351 

moi 
msi 

5 1501 
1551 
lbOl 
IbSl 
1701 
10 1751 
1601 

iasi 

MSI 

15 2001 
5051 
2101 
2151 
2201 

20 2251 
2301 
2351 
2M01 
2MS1 

25 2501 
2551 
2b01 
2L51 
2701 

30 2751 
2601 
2651 
2101 
2T51 

35 3001 
3051 
3101 
3151 
3201 

40 3251 
3301 
3351 
3M01 
3M51 

45 3501 
3551 
31.01 
3b51 
3701 

50 3751 
3801 
3fi51 
3101 
3^51 

55 H001 
M051 
MIDI 

msi 



GAGCAATTAC 
GGTTCGCTTT 
TTGTCAGTAC 
ATGATAAAAG 
TTTTGGGGAG 
TTAGTGCTAT 
ATTAAGCCAA 
TTTTACTGAG 
GCAACCTGAG 
CAGCTTCTCA 
TTGGATTTAT 
TACTTCCTTT 
AAATCTAATC 
CAATATTTTC 
GTTTCAGTAT 
GAAGAGGCTC 
GCCCAAATCA 
TTATTCGACA 
GCTTTACTAC 
TGACTGGATT 
GAATGCTCCT 
GAAGCATTTT 
AGAACACTTG 
TGTTAACATC 
ATTCTGCAAA 
TAGAAGGCTA 
TTGTACGACA 
ATTGTCCTAA 
TATTACCCTA 
TTAGTGCTCA 
ACAATTGGTT 
ATTGAAAAAT 
TCTTAGAGAT 
CCAGATAGCT 
AAATAAGGGA 
AAGTTCAGTG 
CCCAACATTG 
GCCTCTGACG 
TACCTGAGCT 
CAGTTGCTTT 
TGTGGCTCGT 
CACAGGCTAA 
TCTTTTTGTC 
GATCCAAATA 
ACATTGATCC 
GGATGGTCTC 
CACTGGAAGC 
CAGTAATTGA 
TAAAACAACA 
TTAGTAACAA 
CACTAAAACA 
TTATGCTTGC 
ATGTTTCTAT 
GTAAGAAACT 
AAGTGAAGAC 
A6CCCTAGTG 
TGCAAAGATG 
ATAATAGGAG 



TGCAGTTGAT 
GTTTCACTTT 
ACCTGAACAG 
AAGAAGCGTA 
ATGTTATTAT 
CATTGACTTG 
GCTCCTTGAG 
CAGGTTGTCA 
TGCCAACATT 
GGAGCCGTTC 
AGACAGCTGT 
GATTGATGTG 
CAGAAGCCAC 
CAAGGAGTCA 
CACAGCACAG 
TTCTAGCAAA 
TATTCTTCTT 
GGCTCAAGGG 
GTCTCCTTGC 
TGTGAAGAAG 
GACTAATAAT 
CAGCTGTCCC 
ACTCTACTCT 
CAATATGAGC 
CAGTCAATAA 
TGGGTAATGA 
ACAAAAGTAT 
GGTGTGATCA 
CACATGTTGA 
TCTGAAGGAA 
TAGTTGGACA 
GCATTACTGG 
TTGCCTACCT 
TGTTAAGAAA 
ATGGAGGAAG 
CCTTATCTGT 
CTAAGCTTGT 
GTCGCAGGTA 
TATTGCACAG 
CTCACTTGTG 
TTAGCTGTCA 
GCGGTATGCT 
GAGCATTTCC 
GGGCAAGTTT 
AATTATTACA 
AAATCTGTAA 
ATGGATCCTG 
AATAATAAAT 
AAAAGTTGTT 
CTCTAAACTG 
ATTCAATTCA 
TTGTTTCTGA 
TGTAGCAGCT 
CATTATTGGA 
TGCCATGCCT 
TTTATTTTAA 
GGAGAGGAAA 
TTAAAAACAC 



GACGAGCCGT 
CCTTTTGTAT 
GAGCAGCTGA 
TTTTGAGAGT 
TGGTGGCTAT 
GTCTGTTCCA 
CAGGATGAAG 
CAGCTCATGC 
ACTGGATTTT 
CTTTACCAAG 
GTGAAACCTC 
TACATAAATT 
AAATCAGCCA 
TTGGGGGTGA 
CTTTTGGTGC 
CACGAAGACT 
CTTTAATGGA 
CTGCAGCAGG 
TACTAACTAC 
AAATCACAGG 
GCTAAAAATC 
AGTAAATCAC 
CTGCCAGTGA 
CAGCTATTGA 
ACTATGGATG 
CGGTTAATGC 
ACTCAGAATG 
GAGGGTTCAC 
ATGGATATCT 
ACAGAGCAAG 
AACTGATGCT 
CCGCTCAGGA 
ACTGAA6A6G 
TGTTCAAAGT 
GAGAAGACAA 
TGTCTCTTGC 
TCACTTTCAG 
TTCCATCTAT 
CCAGAACTTG 
TATACAATAT 
ATGTCAT6GG 
TTTTTTATGC 
TCCATTGTAT 
GTGCCTCTGA 
CGTCTTCAAC 
AGATTCATCT 
ATGTACAGCT 
ATGAGTGTTA 
TGCTGCATAT 
AATGGGAACA 
ACATGAGTAT 
TTGGCACATC 
GAGCTTTTTT 
AAGGGAATTT 
TTAATTTCTT 
CTGTGAGCTT 
AAAGGGTAAA 
AAGTAGAAAT 



CCTCCTGCTA 
GCTACTGGCC 
TGGTGGTGTG 
ACTTCAGGCG 
GTACTTTCAC 
CTTTGGGGAT 
ACAATCTTCA 
AGTTCGGGTC 
TGCCTATTCA 
CACAAAGTGT 
TACTCCACTT 
CTATACTTAC 
GTCACAGAAC 
CAACATCCGC 
TCTACTATAT 
TTAGCTGCCA 
TCAGATTCCT 
AGTTGGGAGG 
CCACATTTAT 
GACTGATGCC 
ATTCTCCCAA 
ACACAAGTGA 
ACTTATACCA 
ATTCAGGGGT 
GTTCTTAATA 
ACTTCAGCCT 
ACCTGATGAT 
AGATGCCCCC 
TCTTGCATCT 
ATAGGCCTTC 
CCGGAAGTTA 
TAGTGCAGCT 
AGAAAGCAAA 
GTTATTACCA 
TTTGCTCTGT 
ACCAAATGTA 
GGTTATCCAT 
GCACATCTGT 
AGAAACAGAT 
GCATTACCAA 
AACTTTGTTA 
CAACTCTGCC 
GAGGATATTA 
TGTTGCCACT 
AAATAAAGGA 
TATAAAAATG 
CTGTCACTGT 
GTGGAATTTA 
ACCCAACATG 
GTAAAGTATT 
AGTTTAGAAC 
TTTGGATCTA 
TTTTTCCACT 
GGCCTTGTAT 
ATAAAAATGA 
GTAACAGAAT 
GGGAAAGGAG 
CTCAAAGATT 



CGCCAGCTGG 
TTTTCTACAC 
GCTAAGTTGG 
TCTCTGCTTC 
AGCAACCAGC 
GAAGATTGTA 
CACAGGAAAT 
CCTGTCACCA 
TTGTATTTAC 
CAATAAAAGA 
CATCCTCAAT 
TCCTGCGTCG 
AGGAGATACT 
CTTAATCAGC 
ACTGTCTTAT 
TGCAAAGAAA 
ATCAAATTCC 
GTTGCATTCA 
GTATTGTGGA 
CTGCTACGGC 
ACAACTCCAA 
TGCAGATTAT 
TATGCGGAAG 
TCCACGGAGA 
CTGTGATGCC 
TCAATAAAGT 
AGATCCTCTC 
CACTGATGGA 
AAAGCCTACC 
CCAGAATAAT 
CCAGGGAAGA 
GTCCAGATTC 
TGGTGTCAAT 
CCAGCGCTCC 
AACCTTCGAG 
CATTGCAGAT 
GTGAACTTTT 
CTAGATTTCA 
ATTTGCTATC 
AGTCACTTAG 
ACAGTTTTAA 
AAGTTTGGTC 
TGTCTTTGCT 
CAGACAAGAG 
GAAACCAAGT 
GATCCAGGGA 
ATTGAAAGAA 
AAACAAAATT 
AATCTGCATA 
GTCTTGGAAT 
TTTATGAGAA 
CTTTGCTGAT 
GGGAACACAT 
TTAGCTTTTG 
GTCTGTGGGT 
GT6ACAAAGA 
AATTAAGGAA 
TGCAGTGCAA 
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10 



15 



20 



25 



30 



35 



1201 
4251 
4301 
1*351 

4401 
4451 

4501 
4551 
4bDl 
4b51 
4701 
4751 
4flQl 
4fl51 
4^01 
4^51 
SD01 
5D51 
5101 
5151 
5B01 
5251 
5301 
5351 
5401 
5451 
5501 
5551 
SbOl 
Sb51 
5701 
5751 
SfiDl 
5651 



GTAATAGTAA 
AGACTTTTAA 
AGATGAGGAT 
TGAAGCCATA 
CTGGTGTAAA 
AGATCTCATT 
ACTTACCTAA 
AGAGCAATAA 
TATAATTCAT 
TTTGAACCTG 
ATTTTGCAGT 
GGTAAACCCT 
TTTTTTCTCT 
GAAACAGACC 
TCTTCCCCTG 
GCAACTTCTG 
GTAGCTGGGA 
AGACGGGGGT 
GGTGATCTGC 
CCACCGCACC 
AGAAGGTGTA 
TAAATAGTCA 
CCCTACCTCA 
AGTGCATAGT 
TTAATGTTCA 
ATCTTAAAAA 
TAGTCCCTTA 
CTTGCGATAT 
AAAAGTATGT 
AAAGACTG6T 
CATCAGATTT 
GTTAATATTC 
TTTTTTTATT 
TTATAAACAT 



TGCAAGTTGG 
AAAGGCAAGT 
TTAAAGATTT 
TGTTTAAAGA 
AGTGTTTACA 
TAGTTTTATG 
TGCTCAGAGG 
AACCCACTGG 
ATCTTACTTT 
AGGAAAAGTT 
CCTCTGTCAG 
AATCATTAAA 
CTGAGATATA 
TCTTCACTTG 
TTGCCCAGGC 
CCACCTGGGT 
CTGCAGGCAC 
TTGCCATGTT 
CCACCTTGGC 
TGGCCAGACC 
GTTTTTGAGA 
CATCTCATTT 
TATTCTATGA 
ATCACATGTG 
ATTAAGTAAT 
TAGCATAAGA 
A6ATTAAATA 
TTTGTGTGAA 
TTTCTAATTA 
CATAACCTGC 
GTTGATGATG 
ATGTATTTTA 
CAAGTGAAAA 
TCATATTCTT 



AATTCTAGTT 
AGCTTTTGTA 
CACATATTTG 
GATACTTGAA 
GAAACATCTT 
TTTTAAATTT 
GGGGAAATAT 
ATTAAAGAGC 
GAGAAGATCT 
AAAGTGTAGA 
TAACTTCCAT 
AAAAAATTAT 
CCTCAATCAC 
TGTTTTTTTT 
TGGAGTGCAG 
TCAAGGGATT 
GCGCCACCTG 
GCCCAGACTG 
CTCCCAAAGT 
GCTTCACTTG 
AATGAAATTT 
TCTTCCTTTG 
GAATGAGTTT 
ATAGAATATT 
TTTGATGTGA 
ATTTTCATAT 
CAACTGCTCC 
TAGATATGCC 
AATGCAGTGC 
TGTGTTAAAA 
TAAATAAAAT 
AGTTAAGGTT 
CAGATGTGTG 
TATCAAACAA 



CTCAAGAAAG 
AATGATTTCT 
CTTCAATTTT 
TAATTTGGAA 
TGTTCAAAGA 
ATTTTTATAA 
GTAtCAAATT 
TCTTGGTTTG 
TTGAGTAAGA 
AAATATTGTC 
TGATTAGGCA 
CAATGTAGAA 
ACACTTCCCC 
TTTTTTTTCC 
TGGGATGATC 
CTCGTGCCTC 
TATTTTTGTA 
GTTTTGAACT 
GGTGGGATTA 
TAAAAGAAAT 
AACTTTAGCC 
TAAAATGGGG 
GTAGCTGTTT 
TATAACTTTT 
AAAATAAAAG 
TTTTAAACAA 

CTAGGAGTTC 
ACATTCCTGG 
TAATCACATA 
GTGTAAATAT 
ATAAAATTTG 
CAGCTATTTT 
AAAAAAAAAA 
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AGTATTGAGA 
GTGGAAATAC 
TATTAATATA 
TTTTAAGATA 
AGAACCTGAG 
TGCTTTATTA 
AAATGAAGGT 
TCATCAGGAT 
AAATGCAGTG 
TTGCCGAAGG 
GACATATTCA 
AGTAATTCCC 
ACCCCCACTT 
TGAGGTGGAG 
TTGGCTCACT 
AACCTCCTGA 
TTTTTAGTAG 
CCTGGCCTCA 
CAGGTGTGAG 
TAGGCTAATA 
TTTTCACTAG 
TTACTACTGG 
CAAATCATGA 
TATTAGATGC 
TAATAAAAGT 
GGCAGTTTTG 
AAACTGAGGC 
AGAAAAAGTT 
ATCAATATTC 
TGCTCTTTTT 
ATTAGTAAAT 
TCACAATGTG 
GAATATTGGT 
AAAA 



BLAST Results 



40 No BLAST result 



Medline entries 



45 



No Medline entry 



50 



Peptide information for frame 5 



ORF from 77 bp to 3bfifl bpi peptide length: 
Category: putative protein 
55 Classification: unclassified 

Prosite motifs: LEUCINE_ZIPPER (lb?-lfl4) 
LEUCINE_ZIPPER (^£-701) 
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1 MKDtfflTVIMT ECTSLflFVSP FAFEAMflKVD VVCLASLSDP ELRLLLPCLV 

51 RMALCAPADfl SflSlilAflDKKL ILRLLSGVEA VNSIVALLSV DFHALEdJDAS 

101 KEflflLRHKLG GGSGESILVS (JLflHGLTLEF EHSDSPRRLR LVLSELLAIM 

5 151 NKVSESNGEF FFKSSELFES PVYLEEAADV LClLflAELPS LLPIVDVAEA 

501 LLHVRNGAUF LCLLVANVPD SFNEVCRGLI KNGERflDEES LGGRRRTDAL 

551 RFLCKMNPSC2 ALKVRGMVVE ECHLPGLGVA LTLDHTKNEA CEDGVSDLVC 

301 FVSGLLLGTN AKVRTUFGTF IRNG(2<JRKRE TSSSVLUlJMR RfiLLLELflGI 

351 LPTVRSTRIV EEADVDMEPN VSVYSGLKEE HVVKASALLR LYCALMGIA6 

10 101 LKPTEEEAEd LLfiLMTSRPP ATPAGVRFVS LSFCMLLAFS TLVSTPEflEfi 

MSI LMVVWLSblMI KEEAYFESTS GVSASFGEML LLVAMYFHSN (2LSAIIDLVC 

501 STLGMKIVIK PSSLSRMKTI FTflEIFTEflV VTAHAVRVPV TSNLSANITG 

551 FLPIHCIYflL LRSRSFTKHK VSIOUIYRfl LCETSTPLHP (3LLPLIDVYI 

L01 NSILTPASKS NPEATNflPVT EGJEILNIFUG VIGGDNIRLN <2RFSITAl2LL 

15 b51 VLYYILSYEE ALLANTKTLA AMflRKPKSYS SSLMDfllPIK FLIR<2A<3GLl2 

701 (3ELGGLHSAL LRLLATNYPH LCIVDDUICE EEITGTDALL RRHLLTNNAK 

751 NHSPKflLflEA FSAVPVNHTfl VMflllEHLTL LSASELIPYA EVLTSNMSflL 

flOl LNSGVPRRIL (2TVNKLUMVL NTVMPRRLUV MTVNALflPSI KFVR(2UKYT(J 

651 NDLMIDPLIV LRCD<2RVHRC PPLMDITLHM LNGYLLASKA YLSAHLKETE 

20 101 <2DRPS<2NNTI GLVGflTDAPE VTREELKNAL LAA<3DSAAV<3 ILLEICLPTE 

151 EEKANGVNPD SLLRNVflSVI TTSAPNKGME EGEDNLLCNL REVflCLICCL 

• 1001 LHfiMYIADPN IAKLVHFC2GY PCELLPLTVA GIPSMHICLD FIPELIAflPE 

1051 LEKUIFAIUL LSHLCIflYAL PKSLSVARLA VNVMGTLLTV LTflAKRYAFF 

1101 MPTLPSLVSF CRAFPPLYED IMSLLI(2IG<2 VCASDVATflT RDIDPIITRL 

25 • 1151 flfllKEKPSGU SdlCKDSSYK NGSRDTGSMD PDV(2LCHCIE RTVIEIINMS 

1501 VSGI 



30 BLASTP hits 

No BLASTP hits available 

Alert BLASTP hits for DKFZphtes3_30hlS i frame 5 

35 

No Alert BLASTP hits found 

Pedant information for DKFZphtes3_E0hl5i frame 5 



40 

Report for DKFZphtes3_S0hlS-S 



[LENGTH! ISOM 

45 EMIO 13M347-53 

EpIJ 5-75 

EHOMOLJ TREMBL:CEZC37b_3 gene: "ZCSlk-h"--, Caenorhabditis 

elegans cosmid ZC3?b Se-5S 

EPR0SITE1 LEUCINE_ZIPPER 5 

50 CKIO TRANSMEMBRANE 1 

EKID L0U_C0MPLEXITY 5-57 'A 

EKIiD COILED COIL 3-33 */. 



55 SEC MKD(J(3TVIMTECTSL£JFVSPFAFEAM(3KV]>VVCLASLSI>PELRLLLPCLVRMALCAPADfl 

SEG 

PRD cccceeeeeeeccceeecchhhhhhhhheeeeeeecccchhhhhhhchhhhhhhhccccc 
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MEM 

5 SE<2 S(2SWA<2DKKLILRLLSGVEAVNSIVALLSVDFHALE<2DASKEl2<2LRHKLGGGSGESILVS 

SEG 

PRD hhhhhhhhhhhhhhhccccccccccccccccchhhhhhhhhhhhhhhhhcccccceeeec 
COILS 



10 MEI1 

SEC £3L(2HGLTLEFEHSDSPRRLRLVLSELLAIMNKVSESNGEFFFKSSELFESPVYLEEAADV 

SEG xxxxxxxxxxxx 

PRD ccccccceeeecccccchhhhhhhhhhhhhhhhhhcccccccccccccccchhhhhhhhh 
15 COILS 



MEM 

SE<2 LCILfiAELPSLLPIVDVAEALLHVRNGAWFLCLLVANVPDSFNEVCRGLIKNGERflDEES 

20 SEG 

PRD hhhhhhcccccchhhhhhhhhhhhhccchhhhhheeeccccccchhhhhccccccccccc 
COILS 



MEM 

25 

SE<2 LGGRRRTDALRFLCKMNPSflALKVRGMVVEECHLPGLGVALTLDHTKNEACEDGVSDLVC 

SEG 

PRD ccccchhhhhhhhhhcccceeeeeeeeeeeeeccccccceeeecccccccccccccceee 
COILS 

30 

MEM : 

SE<3 FVSGLLLGTNAKVRTUFGTFIRNGfl(3RKRETSSSVLIillSnRRflLLLELnGILPTVRSTRIV 

SEG 

35 PRD eeeccccccceeeeeeeeeeeecchhhhhcccchhhhhhhhhhhhhhhhccccceeeeee 
COILS 



MEfl 

40 SE<3 EEADVDMEPNVSVYSGLKEEHVVKASALLRLYCALflGIAGLKPTEEEAEtfLLtfLflTSRPP 

SEG 

PRD eeeccccccceeeeccccchhhhhhhhhhhhhhhhhhhhccchhhhhhhhhhhhhhcccc 
COILS 



45 MEfl 

SE<2 ATPAGVRFVSLSFCMLLAFSTLVSTPEflEflLMVVIilLSUMIKEEAYFESTSGVSASFGEML 

SEG 

PRD cccceeeeeehhhhhhhhhhccccccchhhhhhhhhhhhhhhhhhhcccccccchhhhhh 
50 COILS 



MEfl fHIMflMflflMMflMMMMflflM 

SE(2 LLVAflYFHSNfiLSAIIDLVCSTLGMKIVIKPSSLSRflKTIFTlSEIFTECJVVTAHAVRVPV 

55 SEG 

PRD hhhhhhhccchhhhhhhhhhhhccceeeeeccccchhhhhhhhhhhhhhhhhhhhheeec 
COILS 
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mem 

se<3 tsnlsanitgflpihciy<2llrsrsftkhkvsikdii]iyr<2lcetstplhp(2llplidvyi 

SEG 

5 PRD cccccceeeeeehhhhhhhhhhhhhcccccccchhhhhhhhhcccccccccccccceQee 
COILS 



mem 

10 sefi nsiltpasksnpeatn(3pvte<2eilnift2gviggdnirlnt3rfsital2llvlyyilsyee 

SEG 

PRD eeccccccccccccccccchhhhhhhhhhhccccccceeeehhhhhhhhhhhhhhhhhhh 
COILS 



15 MEM 

SE<2 ALLANTKTLAAM<2RKPKSYSSSLMD<2IPIKFLIRflAfiGLfi<2ELGGLHSALLRLLATNYPH 

SEG xxxxxxxxxxxxxx'xxxxx 

PRD hhhhhhhhhhhhhhcccccccccccchhhhhhhhhhhhhhhhhcccchhhhhhhhhcccc 
20 COILS 

CCCCCCCCCCCCCCCCCCCCCCCCCCCC 

MEM 

SE<2 LCIVDDUICEEEITGTDALLRRMLLTNNAKNHSPK<3L(2EAFSAVPVNHT(2VM(2IIEHLTL 

25 SEG 

PRD eeeecceeeeechhhhhhhhhhhhhhccccccccchhhhhhhhhhcccchhhhhhhhhhh 
COILS 



MEM 

30 

SE<2 LSASELIPYAEVLTSNMS(3LLNSGVPRRIL«2TVNKLlilMVLNTVMPRRH»IVMTVNALflPSI 

SEG 

PRD hhhhhhhhhhcccccchhhhhccccchhhhhhhhhhhhhhhhccccchhhhhhcccccch 
COILS 

35 

MEM 

SEfl KFVRi3<2KYT<2NDLMIDPLIVLRCDl2RVHRCPPLMDITLHMLNGYLLASKAYLSAHLKETE 

SEG 

40 PRD hhhhhhhcccccccccceeeeecccccccccccceeeccccccchhhhhhhhhhhhhhhh 
COILS 



MEM 

45 SEfl (2DRPS(3NNTIGLVG<2TDAPEVTREELKNALLAAl2DSAAVl2ILLEICLPTEEEKANGVNPD 

SEG v 

PRD ccccccccccccccccccchhhhhhhhhhhhhhhhhhhhhhhhhhhcccccccccccccc 
COILS 



50 MEM 

SE<2 SLLRNV(2SVITTSAPNKGMEEGEDNLLCNLREV<JCLICCLLH<2MYIADPNIAKLVHF(2GY 

SEG 

PRD cceeeeeeeeeecccccccccccchhhhhhhhhhhhhhhhhhhhhhhccccceeeecccc 
55 COILS 



MEM 
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SE<3 PCELLPLTVAGIPSflHICLDFIPELIAtJPELEtCtJIFAIflLLSHLCIlSYALPKSLSVARLA 
SEG 

PRD ccceeeBeeeecccceeeeehhhhhhhhhhhhhhhhhhhhhhhhhhhhhccchhhhhhhh 
COILS 

5 

PIEI1 

SE<2 VNVnGTLLTVLT(2AKRYAFFMPTLPSLVSFCRAFPPLYEDIMSLLI(2IGl2VCASDVATflT 
SEG 

10 PRD hhhhhhhhhhhhhhhhhhhhccccccceeeccccccchhhhhhhhhhhhcchhhhhcccc 
COILS 

mem !..!!!!."!! 

15 SE(3 RDIDPIITRLtafllKEKPSGUSfllCICDSSYKNGSRDTGSIIDPDVflLCHCIERTVIEIINnS 
SEG 

PRD cccchhhhhhhhhhhccccceeeeeccccccccccccccccceeeeeeeehhhhhhheee 
COILS 

20 nEn 

SE(2 VSGI 
SEG • - • • 
PRD eccc 
25 COILS 

hem — 



30 Prosite for DKFZphtes3_20hl2 . 2 

PSOODHT lbT-^fll LEUCINE_ZIPPER PD0C0002 e I 

PSDODST b=iS->7m LEUCINE_ZIPPER PDOC00021 

35 

(No Pfam data available for DKFZphtes3_2Dhl2 -2) 
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5 group: testis derived 

DKFZphtes3_21km encodes a novel SSfl amino acid protein without 
similarity to known proteins. 

10 No informative BLAST resultsi No predictive prositen pfam or SCOP 
motif e • 

The new protein can find application in studying the expression 
profile of testis-specif ic genes. 

15 

unknown protein 
perhaps complete cds- 

20 

Sequenced by LMU 

Locus: unknown 

25 Insert length: 2SH7 bp 

Poly A stretch at pos. 250bi polyadenylation signal at pos. 2171 



1 GGCCACGTTC AGCGGACACG GGAGCAAGAT GGCGATTCCG GGCAGGCAGT 

30 SI ATGGGCTTAT TTTGCCAAAG AAAACACAGC AGTTGCACCC TGTTTTGCAA 

101 AAACCATCAG TGTTTGG6AA TGATTCTGAT GATGATGATG AGACCTCTGT 

151 GAGTGAAAGC CTTCAGAGGG AAGCTGCTAA GAAGCAGGCC AT6AAACAGA 

201 CCAAACTGGA AATCCAGAAG GCCCTTGCAG AAGATGCTAC TGTGTATGAA 

2S1 TATGACAGTA TTTATGATGA AATGCAGAAA AAAAAGGAGG AAAATAATCC 

35 3D1 CAAATTGCTT TTGGGGAAAG ACA6AAAGCC CAAGTATATT CACAACTTGC 

351 TAAAAGCAGT TGAGATCAGA AAAAAGGAAC AGGAAAAAAG AATGGAAAAG 

iJDl AAAATACAGA GAGAACGAGA AATGGAAAAG GGGGAGTTTG ATGATAAAGA 

1451 AGCATTTGTG ACATCTGCAT ATAAGAAAAA ACTGCAAGAG AGAGCTGAAG 

501 AAGAAGAAAG AGAAAAGAGG GCTGCTGCAC TGGAAGCATG TTTGGATGTA 

40 551 ACCAAGCAGA AAGATCTCAG TGGATTTTAT AGGCACCTAT TAAATCAAGC 

bOl AGTTGGTGAA GAGGAAGTAC CTAAATGCAG CTTTCGTGAA GCCAGATCTG 

fc.51 GTATAAAGGA AGAAAAATCA AGGGGCTTCT CCAATGAAGT AAGTTCAAAA 

701 AACAGAATAC CACAAGAGAA ATGCATTCTT CAAACTGATG TGAAAGTAGA 

751 GGAAAACCCA GATGCAGACA GTGACTTCGA TGCTAAGA6C AGTGCGGATG 

45 fiOl ATGAAATAGA AGAAACTAGA GTGAACTGCA GAAGGGAAAA GGTCATAGAG 

flSl ACCCCTGAGA ATGACTTCAA GCACCACAGG AGTCAAAACC ACTCTCGGTC 

101 ACCTAGTGAA GAAAGAGGGC ACAGTACCAG GCACCACACG AAAGGATCAC 

151 GAACGTCGAG AGGACATGAG AAAAGGGAAG ATCA6CACCA GCAGAAGCAA 

1DD1 TCCAGAGACC AAGAGAACCA TTACACTGAC CGTGATTACC GGAAAGAAAG 

50 1051 GGATTCTCAT AGGCACAGAG AGGCCAGTCA TAGAGATTCC CATTGGAAGA 

1101 GGCATGAACA GGAAGATAAA CCAAGGGCGA GGGACCAAAG AGAAAGAAGT 

1151 GACAGAGTAT GGAAAAGGGA GAAAGATAGG GAGAAATATT CCCAAAGAGA 

1201 ACAAGAAAGA GATAGACAAC AAAATGATCA GAACCGACCC AGT6AGAAAG 

1251 GAGAGAAGGA AGAGAAAAGC AAAGCAAAGG AAGAGCATAT GAAAGTAAGG 

55 1301 AAGGAAAGAT ATGAAAATAA TGATAAATAC AGAGATAGAG AAAAACGAGA 

1351 GGTAGGTGTT CAGTCTTCAG AAAGAAATCA AGACAGAAAG GAAAGCAGCC 

mOl CAAATTCTAG GGCAAAGGAT AAATTTCTTG ACCAAGAAAG ATCCAACAAA 

1M51 ATGAGAAACA TGGCAAAGGA CAAAGAAAGA AACCAAGAGA AACCCTCTAA 
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1501 TTCTGAATCA TCACTGGGAG 
15S1 AGAAGGGTAA AGAACAAGAG 
1L.01 AAGCGGAACA ATGAAGAAAC 
IbSl CAGGCAGATG GCGCGGGTTA 
5 1701 ATTGATGGCT ACCCCAAGAG 
1751 TTCCTGGAAC CTGCTGCGTA 
1601 TGGAGGGCAT TTTTAAATTT 
IfiSl TTACAGCTTG GATGTTTGGA 
1=101 GTGTACTCAT CAATACCACA 

10 nsi TGTGCTAATT CCCTGTAGGT 
20D1 CCATTAAAAA ATAATTTTGG 
SDS1 AACATTTTGG GAGGCCAAGG 
2101 CCAGCCTAGG CA6GATAATA 
S1S1 CCTAGCATGG TAGTCCATGC 

15 S201 CAAGAAGATC ACTTGAGCCT 
2251 ATGCCACTGC ACTCCAACCT 
2301 AATTTTTTTT AAATAAATAA 
2351 GAAATGTATT TCAGATAAAA 
2L401 ATGTTCTGAA ATTTGTATTT 

20 2M51 TAGCTTACAG CATAGTTGGC 
2501 ATTTGGAAAA AAAAAAAAAA 



CAAAACACAG ACTCACAGAG GAAGGGCAAG 
AGACCACCTG AGGCAGTGAG CAAGTTTGCA 
TGTAATGTCA GCTAGAGACA GGTACTTGGC 
ATGCAAAGAC CTATATTGAG AAAGAAGATG 
AAAGATTTAA GGAAGCACAG AAAACTGTAA 
AAACCATAAA GGAGTGTGTT ACCAGTAGTT 
ATTTTCAAAA TTTTAAGTTA AAAGTCAGTC 
TGTGGATGTT TGGCTGAATT TATATATAGT 
TTCTTTGTTG TATTCAAGAA CCGTTAAGAG 
ACATAATGAG GAAAATTTGC TCCACTACAA 
CCAGATACGG TAGCTCGTGC CTGTAATACC 
CAGAAGGATA TTGAGGCTAG GCATTCAAGA 
AGACCTTGTC TCTATTTAAA AAACAAAAAG 
CTGTAGTCCC AGCTGTTCGA GAGGCTGAGG 
AGGAATTTGA TGTTACAGTG AGGTATGATC 
GGGCAACAGA ATGAGACCCT GTCTCTAAAA 
TTTAACTCTT CTAATAATGT TTTGTTGCAG 
TATGGATTTG AAAAACAGAA AATATACTTT 
AAGTATAAAA TGTGAATCAT CTTGTCTAAA 
TTAAATGAAA ATAAAATGAT ATGCTTATAC 
AAAAAAAAAA AAAAAAAAAA AAAAAAG 



BLAST Results 

25 — 

No BLAST result 

30 Medline entries 



No Medline entry 

35 

Peptide information for frame 2 



40 ORF from 2=1 bp to 1702 bpi peptide length: 55A 
Category: similarity to unknown protein 
Classification: Nucleic acid management 



1 HAIPGRflYGL ILPKKTfiflLH 

45 51 KK<3ANK<2TKL EIC2KALAEDA 

101 PKYIHNLLKA VEIRKKEflEK 

151 KL<3ERAEEEE REKRAAALEA 

201 SFREARSGIK EEKSRGFSNE 

251 DAKSSADDEI EETRVNCRRE 

50 301 RHHTKGSRTS RGHEKREDC3H 

351 HRDSHUKRHE <2E1)KPRARI><2 

M01 C3NRPSEKGEK EEKSKAKEEH 

M51 (2DRKESSPNS RAKDKFLDflE 

501 RLTEEGflEKG KE(2ERPPEAV 

55 551 TYIEKEDD 



PVLflKPSVFG NDSDDDDETS VSESLflREAA 

TVYEYDSIYD EMflKKKEENN PKLLLGORK 

RMEKKIflRER EMEKGEFDDK EAFVTSAYKK 

CLDVTKdKDL SGFYRHLLNfl AVGEEEVPKC 

VSSKNRIPflE KCILflTDVKV EENPDADSDF 

KVIETPENDF KHHRS(2NHSR SPSEERGHST 

<3(2K<2SRD<2EN HYTDRDYRKE RDSHRHREAS 

RERSDRVIdKR EKDREKYSflR E<2ERDR(3<2ND 

MKVRKERYEN NDKYRDREKR EVGV<3SSERN 

RSNKMRNMAK DKERNflEKPS NSESSLGAKH 

SKFAKRNNEE TVMSARDRYL ARfiMARVNAK 
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BLASTP hits 



PCT/1B01/02050 



No BLASTP hits available 
5 Alert BLASTP hits for »KFZphtes3_21km -. frame 2 

No Alert BLASTP hits found 

Pedant information for DKFZphtes3_21kl4 i frame 2 



10 



Report for DKFZphtes3_21km -2 



15 



20 



25 



30 



35 



40 



45 



50 



55 



ELENGTH3 
(EHUD 
EpIJ 
EH0M0L3 
thaliana 
sequence 
CFUNCAT3 
YKRQ c 12c]l 
EFUNCAT3 
le-DS 
CFUNCAT3 
myristyl 
CS. 
EBL0CKS3 
CBL0CKS1 
EEC3 
EEC3 
EPIRKU3 
EPIRKbO 
EPIRKU3 
IPIRKIO 
EPIRKbO 
EPIRKbO 
EPIRKIO 
IPIRKIO 
EPIRKbO 
EPIRKbO 
EPIRKbO 
EPIRKbO 
EPIRKbO 
EPIRKbO 
CSUPFAH3 
ESUPFAI13 
Oh 

ESUPFAM3 

ESUPFAM3 

le-Ot. 

CSUPFAM3 

ESUPFAN3 

ESUPFAM3 

ESUPFAM3 

ESUPFAI13 

IE Kbl 3 

EKU3 



5L7 

b?2b2-fi c J 

fl • Tb 

TREMBL:AC0Qb233_m gene: n F12K2.m n n Arabidopsis 
chromosome II BAC F12K2 genomic sequence-i complete 
. 3e-ll 

04-^1 other transcription activities CS- cerevisiaen 
le-DS 

30-10 nuclear organization CS- cerevisiae-i YKR0T2c3 

0b-D7 protein modification (glycolsylationi acylationi 
ationi palmitylationi f arnesylation and processing) 
cerevisiae-. YKL201C3 le-QH 
PFQ07HfiF 

BLD11B2E Glycosyl hydrolases family 35 proteins 
2-7-1-37 Protein kinase 7e-0b 
5.^.1-2 SNA topoisomerase He-Db 
phosphotransferase 7e-0b 
pre-mRNA splicing le-Ob 
citrulline 3e-0b 
tandem repeat 3e-0b 
DNA binding He-Ob 
DNA replication He-Ob 
isomerase He-Ob 
ATP 3e-0b 

phosphoprotein le-Ob 
calcium binding 3e-0b 
alternative splicing 7e-0b 
P-loop 3e-0b 
EF hand 3e-0b 
hair 3e-0b 
DEAD/H box helicase homology 3e-0b 

unassigned Ser/Thr or Tyr-specific protein kinases He- 
calmodulin repeat homology 3e-Db 

unassigned ribonucleoprotein repeat-containing proteins 

unassigned DEAD/H box helicases 3e-0b 
trichohyalin 3e-0b 
protein kinase homology He-Ob 
eukaryotic type I DNA topoisomerase He-Ob 
ribonucleoprotein repeat homology le-Ob 
All_Alpha 

LOU_C0MPLEXITY 22-75 V. 
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SEfl ATFSGHGSKMAIPGRflYGLILPKKTflflLHPVLflKPSVFGNDSDDDDETSVSESLflREAAK 

SEG xxxxxxxxxxxxx 

5 PRD ccccccccccccccccceeeeccccccccccccccccccccccccccccchhhhhhhhhh 

SEfl KflAMKflTKLEIflKALAEDATVYEYDSIYDEnflKKKEENNPKLLLGKDRKPKYIHNLLKAV 

SEG xxxxxxxxxxxxxx 

PRD hhhhhhhhhhhhhhhhhhcccccccccchhhhhhhhhhchhhhhhccccchhhhhhhhhh 

10 

SEfl EIRKKEflEKRMEKKIflREREflEKGEFDDKEAFVTSAYKKKLflERAEEEEREKRAAALEAC 

SEG xxxxxxxxxxxxxxxxxxxxxxx xxxxxxxxxxxxxxxxxxxxxx. 

PRD hhhhhhhhhhhhhhhhhhhhhhccccccccchhhhhhhhhhhhhhhhhhhhhhhhhhhhh 

15 SEfl LDVTKflOLSGFYRHLLNflAVGEEEVPKCSFREARSGIKEEKSRGFSNEVSSKNRIPflEK 

SEG 

PRD hhhhhhhccchhhhhhhhhhhhccccccccccccchhhhhhhhhhhhhhhhhhhhhhhhh 

SEfl CILflTDVKVEENPDADSDFDAKSSADDEIEETRVNCRREKVIETPENDFKHHRSflNHSRS 

20 SEG xxxxxxxxxxxxxx 

PRD hhhhhhhhhhhcccchhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhcccc 

SEfl PSEERGHSTRHHTKGSRTSRGHEKREDflHflflKflSRDflENHYTDRDYRKERDSHRHREASH 

SEG 

25 PRD ccccchhhhhhhhhhhhcccchhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhchh 

SEfl RDSHUKRHEflEDKPRARDflRERSDRVIdKREKDREKYSflREflERDRflflNDflNRPSEKGEKE 

SEG xxxxxxxxxxxxxxx • • xxxxxx 

PRD hhhhhhhhccccchhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhccccccccccccchhh 

30 

SEfl EKSKAKEEHMKVRKERYENNDKYRDREKREVGVflSSERNflDRKESSPNSRAKDKFLDflER 

SEG xxxxxxxxxxx 

PRD hhhhhhhhhhhhhhhhhhhccchhhhhhhhhhhhhhhhchhhhhhhhhhhhhhhhhhhhh 

35 SEfl SNKMRNMAKDKERNflEKPSNSESSLGAKHRLTEEGflEKGKEflERPPEAVSKFAKRNNEET 

SEG xxxxxxxxxxx 

PRD hhhhhhhhhhhhhhhhccchhhhhccchhhhhhhhhhhhhhhccccchhhhhhhccccch 

SEfl VMSARDRYLARflMARVNAKTYIEKEDD 

40 SEG 

PRD hhhhhhhhhhhhhhhhhchhhhhcccc 



(No Prosite data available for DKFZphtes3_21klM -2) 

45 

(No Pfam data available for DKFZphtes3_21km . 2) 
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5 group: testis derived 

DKFZphtes3_22ill encodes a novel 5fiQ amino acid protein with 
similarity to RCCl-like G exchanging factor RLGi UVRfl (UVB- 
resistance protein) of Arabidopsis thaliana and to the murine 
10 retinitis pigmentosa GTPase regulator. 

No informative BLAST results; No predictive prosite-i pfam or SCOP 
motife. 

15 The new protein can find application in studying the expression 
profile of testis-specif ic genes- 



Homo sapiens chromosome 7q25 sequence-i ORFMi extension 

20 

differences to genmodel of ORF^ 
differential splicing 

Sequenced by LMU 

25 

Locus: /map="?q22" 
Insert length: 223b bp 

Poly A stretch at pos. B117i polyadenylation signal at pos. 21flQ 

30 ' 

1 ACAATGCTCA GATCGGGAGG TGGAGCCAAT CAGGTCCAAC CAAGAGGAGG 

SI GGACACCGGC ACTCCACTAG CAGGAAAACG GGCCGAGGGA CCGCAAGCAG 

1D1 GGGGTGCCTA GTCCTCGTCC CCCAAAGACC AATCGTAAGC CAGATACAGG 

35 1S1 CGAGTGACTG TCAAGAAGGC CAATTAGAGC CTCCGAAGGG AATCTGGACC 

201 TGCCTCTTCT CTGAGGGACG GCTCTACCTA CCAATAGCAT GGGCGAGAAG 

251 GCGGTCCCTT TGCTAAGGAG GAGGCGGGTG AAGAGAAGCT GCCCTTCTTG 

301 TGGCTCGGAG CTTGGGGTTG AAGAGAAGAG GGGGAAAGGA AATCCGATTT 

351 CCATCCAGTT GTTCCCCCCA GAGCTGGTGG AGCATATCAT CTCATTCCTC 

40 1401 CCAGTCAGAG ACCTTGTTGC CCTCGGCCAG ACCTGCCGCT ACTTCCACGA 

HSl AGTGTGCGAT GGGGAAGGCG TGTGGAGACG CATCTGTCGC AGACTCAGTC 

501 CGCGCCTCCA AGATCAGGGT TCTGGAGTCC GGCCCTGGAA GAGAGCTGCC 

551 ATTCTGAACT ACACGAAGGG CCTGTATTTC CAGGCATTTG GAGGCCGCCG 

bOl CCGATGTCTC AGCAAGAGCG TGGCCCCCTT GCTAGCCCAC GGCTACCGCC 

45 t51 GCTTCTTGCC CACCAAGGAT CACGTCTTCA TTCTTGACTA CGTGGGGACC 

701 CTCTTCTTCC TCAAAAATGC CCTGGTCTCC ACCCTCGGCC AGATGCAGTG 

751 GAAGCGGGCC TGTCGCTATG TTGTGTTGTG TCGTGGAGCC AAGGATTTTG 

flOl CCTCGGACCC AAGGTGTGAC ACAGTTTACC GTAAATACCT CTACGTCTTG 

SSI GCCACTCGGG AGCCGCAGGA AGTGGTGGGT ACCACCAGCA GCCGGGCCTG 

50 101 TGACTGTGTT GAGGTCTATC TGCAGTCTAG TGGGCAGCGG GTCTTCAAGA 

151 TGACATTCCA CCACTCAATG ACCTTCAAGC AGATCGTGCT GGTTGGTCAG 

10D1 GAGACCCAGC GGGCTCTACT GCTCCTCACA GAGGAAGGAA AGATCTACTC 

1051 TTTGGTAGTG AATGAGACCC AGCTTGACCA GCCACGCTCC TACACGGTTC 

11D1 AGCTGGCCCT GAGGAAGGTG TCCCACTACC TGCCTCACCT GCGCGTGGCC 

55 1151 TGCATGACTT CCAACCAGAG CAGCACCCTC TACGTCACAG ACCAGGGGGG 

1201 AGTGTATTTT GAGGTGCATA CCCCAGGGGT GTATCGCGAT CTCTTTGGGA 

1251 CCCTTCAAGC CTTTGACCCC CTGGACCAGC AGATGCCGCT TGCTCTCTCA 

1301 CTGCCTGCCA AGATCCTATT CTGTGCTCTT GGCTACAACC ACCTTGGCCT 
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1351 GGTGGATGAA TTTGGCCGAA TCTTCATGCA AGGAAATAAC 

1M01 AGCTAGGAAC AGGGGACAAA ATGGACCGAG GGGAACCCAC 

mSl TACCTGCAGC GGCCCATCAC CCTGTGGTGC GGCCTCAACC 

1SQ1 GCTGAGCCAG AGCTCAGAGT TCAGCAAGGA GCTGCTGGGC 

5 1551 GGGC7GGGGG CCGCCTCCCA GGCTGGCCCA AGGGGAGTGC 

lfe.01 AAGCTCCAAG TCAAGGTCCC TCTGTGTGCC TGTGCCCTCT 

IbSl GGAGTGCCTA TACATCCTGT CCAGCCACGA CATTGAGCAG 

1701 ATCGCCACCT GCCAGCCAGC AGGGTGGTGG GGACTCCTGA 

1751 GGGGCCAGAG CACCCCAGGA CCCCGGGGGG ATGGCCCAGG 

10 IfiOl GTACCTCAGC CAGATCCACA GTTGCCAAAC GTTGCAGGAC 

1651 AGATGAAGGA GATCGTAGGG TGGATGCCCC TGATGGCCGC 

nOl TTCTTCTGGG AGGCCCTGGA CATGCTGCAG AGGGCTGAAG 

nSl TGGTGTAGGG CCCCCAGCCC CTGAGACCTA ATCCCCCTCA 

2001 GTCCCTGGAG GAGGGAGTCC GGCCCCAGGC CAGGGACTAA 

15 S051 CCATTGTGCA CATGCGTGTG GGAAGGGGTT GCTAGGGGGT 

2101 AACCAGGGTA AGAATGTTCA GGGGGCTGCC CAGGAGGGGC 

2151 ACTATCATGG ACAAGAGATT TGATGGATAG AATAAAAGGC 

2201 AAAAAAAAAA AAAAAAAAAA AAAAAAAAAA AAAAAG 

20 

BLAST Results 



Entry AF05335b from database EMBL •* 
25 Homo sapiens chromosome 7q22 sequence-) complete sequence* 
Score = 2152-, P = O-De+OO-. identities = bbb/72T 
10 exons 



30 

Medline entries 



No Medline entry 

35 



Peptide information for frame 2 

40 

ORF from 231 bp to 1T7S bpi peptide length: SAO 
Category: similarity to unknown protein 
Classification: no clue 

45 1 MGEKAVPLLR RRRVKRSCPS 

51 ISFLPVRDLV ALGtf TCRYFH 

101 KRAAILNYTK GLYFflAFGGR 

151 YVGTLFFLKN ALVSTLGfiMfl 

201 LYVLATREPfl EVVGTTSSRA 

50 251 LVGflETdRAL LLLTEEGKIY 

301 LRVACMTSNtf SSTLYVTDfiG 

351 LALSLPAKIL FCALGYNHLG 

M01 TflVCYLURPI TLUCGLNHSL 

M51 ASFVKLflVKV PLCACALCAT 

55 501 EPSLGARAPfl DPGGMAUACE 

551 AflOFFUEAL DMLflRAEGGG 



PCT/1B01/02050 

AGATACGGGC 
ACAGGTTTGT 
ACTCCCTGGT 
TGCGGCTGTG 
CTCCTTCGTC 
GTGCCACCAG 
CACGCCCCCT 
GCCCAGCCTG 
CCTGCGAGGA 
CGCACGGAGA 
ACAGAAGGAC 
GkGGCGGGGG 
TGCTAGCCTA 
GGAGCAATGA 
GGGGACGGCT 
CCCCAACCTG 
TGCAGCGAAA 



CGSELGVEEK RGKGNPISIfi LFPPELVEHI 
EVCDGEGVUR RICRRLSPRL (2IX2GSGVRPU 
RRCLSKSVAP LLAHGYRRFL PTKDHVFILD 
UKRACRYVVL CRGAKDFASD PRCDTVYRKY 
CDCVEVYLflS SGflRVFKMTF HHSMTFKfllV 
SLVVNETflLD flPRSYTV(2LA LRKVSHYLPH 
GVYFEVHTPG VYRDLFGTLfl AFDPLDflflMP 
LVDEFGRIFM AGNNRYGflLG TGDKMDRGEP 
VLSflSSEFSK ELLGCGCGAG GRLPGUPKGS 
RECLYILSSH DIEflHAPYRH LPASRVVGTP 
EYLSfllHSCtJ TLflDRTEKMK EIVGIdMPLMA 
GGVGPPAPET 
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10 



30 



50 



WO 01/98454 PCT/IB01/02050 

BLASTP hits 

No BLASTP hits available 

Alert BLASTP hits for DKFZphtes3_22ill-, frame S 

TREMBL :AFDS335b_ll product: "0RFM"i Homo sapiens chromosome ?q22 
sequence! complete sequencer N = Score = P = l-be-151 



TREMBL :AF130441_1 gene: "UVRfl", product: "UVB-resistance protein 
UVRB"i 

Arabidopsis thaliana UVB-resistance protein UVRfl (UVR8) mRNA-i 
complete 

15 cds.-, N = l-i Score = IQT-. P = 0.0082 

TREHBL : AF0MMb?7_l gene: "Rpgr"i product: "retinitis pigmentosa 
GTPase 

regulator"! Mus musculus retinitis pigmentosa GTPase regulator 
20 (Rpgr) 

mRNA-i complete cds--. N = li Score = 10b-i P = 0-03S 

>TREMBL:AF05335b_ll product: "0RFM"i Homo sapiens chromosome 
25 7q22 sequence^ 

complete sequence- 
Length = 31A 



HSPs: 

Score = 1554 (233-2 bits)i Expect = 1-be-lSTi P = l-be-151 
Identities = 303/318 CIS*)-. Positives = 303/318 



fluery: 1 

35 MGEKAVPLLRRRRVKRSCPSCGSELGVEEKRGKGNPISIflLFPPELVEHIISFLPVRDLV t.0 

MGEKAVPLLRRRRVKRSCPSCGSELGVEEKRGKGNPISItfLFPPELVEHIISFLPVRDLV 
Sbjct: 1 

MGEKAVPLLRRRRVKRSCPSCGSELGVEEKRGKGNPISIiJLFPPELVEHIISFLPVRDLV bO 

40 

Query- tl 

ALGflTCRYFHEVCDGEGVURRICRRLSPRLdDlJGSGVRPliJKRAAILNYTKGLYFfiAFGGR 120 
ALGflTCRYFHEVCDGEGVIdRRICRRLSPRLflDfi 

TKGLYFflAFGGR 

45 Sbjct: bl ALGflTCRYFHEVCDGEGVlilRRICRRLSPRLfiDflD 

TKGLYF(2AFGGR 10b 

(2uery: 121 

RRCLSKSVAPLLAHGYRRFLPTKDHVFILDYVGTLFFLKNALVSTLG<2M<3UKRACRYVVL 180 

RRCLSKSVAPLLAHGYRRFLPTKDHVFILDYVGTLFFLKNALVSTLGflMfiUKRACRYVVL 
Sbjct: 107 

RRCLSKSVAPLLAHGYRRFLPTKDHVFILDYVGTLFFLKNALVSTLG(2M<3li)KRACRYVVL Ibb 

55 fluery: 181 

CRGAKDFASDPRCDTVYRKYLYVLATREP(2EVVGTTSSRAC]>CVEVYL(2SSG<2RVFKMTF 240 

CRGAKDFASDPRCDTVYRKYLYVLATREPfiEVVGTTSSRACDCVEVYLflSSGflRVFKMTF 
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Sbjct: It? 

CRGAKDFASDPRCDTVYRKYLYVLATREPflEVVGTTSSRACDCVEVYLflSSGflRVFKPlTF 22b 
fluery: EMI 

5 HHSMTFKfllVLVGflETflRALLLLTEEGKIYSLVVNETflLDflPRSYTVflLALRKVSHYLPH 3D0 

HHSHTFK(2IVLVG(2ETlJRALLLLTEEGKIYSLVVNET(2L]>(2PRSYTV(2LALRKVSHYLPH 
Sbjct: 2E7 

HHSnTFKfllVLVGflETflRALLLLTEEGKIYSLVVNETflLDflPRSYTVflLALRKVSHYLPH 2flb 

10 

fluery: 3D1 LRVACMTSNflSSTLYVTD 31fl 

LRVACHTSNflSSTLYVTD 
Sbjct: 567 LRVACHTSNflSSTLYVTD 3DH 

15 

Pedant information for DKFZphtes3_22ill-i frame 2 
Report for DKFZphtes3_22ill . 2 

20 

ELENGTHJ 5BD 
ICIIIO bMfifll^T 
EpIJ 1-DI 

25 EHOMOLJ TREHBL:AF0S33Sb_ll product: "ORFI"; Homo sapiens 

chromosome 7q22 sequence-! complete sequence, le-17 1 ! 
IBLOCKSJ BL0Dt.2SB Regulator of chromosome condensation (RCC1) 
proteins 

EBL0CKS3 BL0Qb2SA Regulator of chromosome condensation (RCCl) 
30 proteins 

CKUJ Alpha_Beta 

CKIO L0U_C0I1PLEXITY 3.L.2 V. 

35 SEfl PIGEKAVPLLRRRRVKRSCPSCGSELGVEEKRGKGNPISIflLFPPELVEHIISFLPVRDLV 

SEG 

PRD ccccchhhhhhhhhcccccccccccccccccccccceeeeccccchhhhhhheeeeeeee 

SEfl ALGflTCRYFHEVCDGEGVWRRICRRLSPRLflDflGSGVRPWKRAAILNYTKGLYFflAFGGR 

40 SEG 

PRD ecccceeeeeeeecceeeeeeeeeecccccccccccccccccchhhhhhccceeeecccc 

SEfl RRCLSKSVAPLLAHGYRRFLPTKDHVFILDYVGTLFFLKNALVSTLGflllflliJKRACRYVVL 

SEG 

45 PRD eeeeccchhhhhhhheeeeccccceeeeeeeeeeecccceeeeeeccchhhhhhhhheee 

SEfl CRGAOFASDPRCDTVYRKYLYVLATREPflEVVGTTSSRACDCVEVYLflSSGflRVFKMTF 

SEG 

PRD ecccccccccccceeeeeehhhhhhhhccceeeeeccccceeeeeeeeecccceeeeeec 

50 

SEfl HHSMTFKfllVLVGflETflRALLLLTEEGKIYSLVVNETflLDflPRSYTVflLALRKVSHYLPH 

SEG 

PRD ccccceeeeeeeehhhhhhhhhhhhhcceeeeeeeccccccccceeeehhhhhhhhccce 

55 SEfl LRVACMTSNflSSTLYVTDflGGVYFEVHTPGVYRDLFGTLflAFDPLDflflMPLALSLPAKIL 

SEG 

PRD eeeeeeccccccceeeecccceeeeeccccccccccceeeecccccccceeeeeccceee 
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SE<2 FCALGYNHLGLVDEFGRIFMflGNNRYGflLGTGDKHDRGEPTOVCYLflRPITLIilCGLNHSL 

SEG 

PRD eeeeccccceeeeeceeeeeecccccccccccccccccccceeeeeccceeeecccccee 

5 SEfl VLSflSSEFSKELLGCGCGAGGRLPGWPKGSASFVKLflVKVPLCACALCATRECLYILSSH 

SEG xxxxxxxxxxxxxx 

PRD eeeeccccceeeeccccccccccccccccceeeeeeeeeeeeeeeeeeeecccceeeecc 

SEfl I>IEflHAPYRHLPASRVVGTPEPSLGARAP(21)PGGI1A(3ACEEYLStJIHSC(JTL(JDRTEfCriK 

10 SEG 

PRD cccccccccccccceeeecccccccccccccccchhhhhhhhhhhhhcchhhhhhhhhhh 

SE<2 EIVGIdt1PLMAA<2KDFFIi)EALDI1L<2RAEGGGGGVGPPAPET 

SEG xxxxxxx- ..... 

15 PRD hhhhcchhhhhhhhhhhhhhhhhhhhccccceeecccccc 

(No Prosite data available for DKFZphtes3_25ill - 5) 

20 (No Pfam data available for DKFZphtes3_22ill • S) 
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PCT/lBOl/02050 



5 group: testis derived 

DKFZphtes3_2SlE4 encodes a novel H51 amino acid protein with 
similarity to the F-box protein FBL2 of the rat- 

10 No informative BLAST results} No predictive prosite-i pfam or SCOP 
motif e- 

The new protein can find application in studying the expression 
profile of testis-specif ic genes. 

15 

similarity to p37NB (Homo sapiens) 
Sequenced by LMU 

20 

Locus: /map="7qEE-q31.1" 
Insert length: 1S37 bp 

Poly A stretch at pos. 1451-1 no polyadenylation signal found 

25 

1 CAACAGGACG ATGCGACTCC TGCCGAGGCA CTTCCACAAC TTACAGAATC 
SI TTAGTTTGGC TTATTGCAGA CGGTTCACAG ACAAAGGCTT ACAGTACCTG 
101 AACTTGGGGA ATGGATGCCA CAAGCTCATC TATCTGGACC TCTCTGGCTG 
30 151 CACCCAGATT TCAGTCCAAG GCTTCAGGTA CATTGCAAAC AGCTGCACTG 
EDI GAATTATGCA TCTTACCATT AATGACATGC CAACTCTGAC GGACAACTGT 
ESI GTAAAAGCTT TA6TTGAAAA ATGCTCTCGT ATTACATCGC TGGTTTTCAC 
3D1 TGGTGCACCG CATATCTCCG ATTGTACTTT CAGAGCTCTT TCTGCTTGTA 
3S1 AACTCAGAAA GATCCGATTT GAAGGAAATA AAAGGGTTAC TGATGCATCC 
35 401 TTCAAATTTA TAGACAAGAA TTATCCAAAT CTCAGTCACA TTTATATGGC 
451 TGACTGCAAG GGAATAACAG ACAGCAGCCT CAGATCCCTT TCACCTTTGA 
501 AGCAACTGAC TGTGTTGAAT TTGGCAAATT GTGTAAGAAT TGGTGATATG 
551 GGACTAAAGC AATTTCTTGA TGGTCCTGCA AGCATGAGGA TAAGAGAGCT 
bOl AAATTTAAGC AACTGTGTGC GGCTAAGTGA TGCCTTTGTT ATGAAACTAT 
40 bSl CTGAGCGCTG CCCTAATTTA AACTACTTGA GTTTACGAAA TTGTGAACAT 
701 TTGACTGCCC AAGGAATTGG ATATATTGTA AACATCTTTT CCTTGGTATC 
751 AATAGATCTC TCTGGAACAG ACATCTCTAA TGAGGGTTTG AATGTGCTTT 
flOl CCAGACATAA AAAATTGAAG GAACTTTCTG TATCTGAATG TTATAGAATC 
B51 ACTGATGATG GAATTCAGGC ATTCTGCAAA AGCTCACTGA TCTTGGAACA 
45 101 TTTGGAT6TC TCTTATTGCT CCCAGCTGTC AGATATGATT ATCAAAGCAC 
151 TGGCCATTTA CTGCATTAAC CTCACATCTC TCAGCATTGC TGGCTGTCCA 
1001 AAGATTACT6 ACTCAGCAAT GGAGATGTTA TCGGCAAAAT GCCATTACCT 
1051 GCACATTTTG GATATCTCTG GTTGTGTCTT GCTTACTGAC CAAATCCTTG 
1101 AGGACCTTCA GATAGGCTGC AAACAACTCC GGATCCTTAA GATGCAATAC 
50 1151 TGCACAAATA TTTCCAAGAA GGCAGCTCAA AGAATGTCAT CTAAAGTTCA 
1E01 GCAGCAGGAA TACAACACTA ATGACCCTCC ACGTTGGTTT GGCTATGATA 
1S51 GGGAAGGAAA CCCTGTTACA GAGCTTGACA ACATAACATC ATCTAAAGGA 
13D1 GCCTTAGAAT TAACAGTGAA AAAGTCAACA TACAGCAGTG AAGACCAAGC 
1351 AGCGTGACCT TCAGCCTCAA GCAGGAAGAA CAAAAAATCA AGAACTTGGC 
55 mOl AAGTTTTCTC CATTTGTTGC AAGTATGTTT ACTAGCTGAA TCTCAATAAC 
mSl AATGTAAACA AGCAAAAAAA AAAAAAAAAA AAAAAAAAAA AAAAAAAAAA 
1501 AAAAAAAAAA AAAAAAAAAA AAAAAAAAAA AAAAAAG 
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5 Entry AC00S250 from database EMBL: 

Homo sapiens BAC clone RG316M05 from 7q2S-q31.lT complete 
sequence* 

Score = fi30i P = l.fie-121-i identities = 160/1*53 

10 Entry HS3B^D7 from database EMBL : 
Human p37NB. mRNAi complete cds- 
Score = 31A-. P = M-be-04-. identities = 70/76 



15 



50 



Medline entries 



1713bfl75: 

20 Kim D-. Lafiuaglia MP-, Yang SY-i A cDNA encoding a putative 37 kDa 
leucine-rich repeat 

(LRR) proteinn p37NBn isolated from S-type neuroblastoma 

cell has a differential tissue distribution- Biochim Biophys Acta 

- 

25 Dec ll;13QT(3) :183-fl 



30 Peptide information for frame H 

0RF from 11 bp to 1354 bpi peptide length: 44fl 
Category- similarity to known protein 
35 Classification: unclassified 

1 MRLLPRHFHN LflNLSLAYCR RFTDKGLfiYL NLGNGCHKLI YLDLSGCTCI 

51 SVfiGFRYIAN SCTGIMHLTI NDNPTLTDNC VKALVEKCSR ITSLVFTGAP 

101 HISDCTFRAL SACKLRKIRF EGNKRVTDAS FKFIDKNYPN LSHIYMADCK 

40 151 GITDSSLRSL SPLKflLTVLN LANCVRIGDM GLKUFLDGPA SMRIRELNLS 

201 NCVRLSDAFV RKLSERCPNL NYLSLRNCEH LTAflGIGYIV NIFSLVSIDL 

251 SGTDISNEGL NVLSRHKKLK ELSVSECYRI TDDGIflAFCK SSLILEHLDV 

301 SYCSiJLSDMI IKALAIYCIN LTSLSIAGCP KITDSAI1EHL SAKCHYLHIL 

351 DISGCVLLTD <2ILEDL<3IGC KflLRILKPIflY CTNISKKAAfl RflSSKvt2i3iaE 

45 401 YNTNDPPRUF GYDREGNPVT ELDNITSSKG ALELTVKKST YSSE1XIAA 



BLASTP hits 
No BLASTP hits available 

Alert BLASTP hits for DKFZphtes3_22124i frame 2 
55 No Alert BLASTP hits found 

Pedant information for DKFZphtes3_22124 frame 2 
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Report for DKFZphtes3_EElE4 - E 



5 ELENGTH3 451 

imil 50545-15 

EHOMOL]) TREPIBLNEU : AFlflfc»E73_l product: "leucine-rich 

repeats containing F-box protein FBL3"i Homo sapiens leucine-rich 

10 repeats containing F-box protein FBL3 mRNA, complete cds- fie-31 
EFUNCAT3 11-01 stress response ES- cerevisiae, YJROIOcJ fle-EO 
EFUNCAT3 03-01 cell growth ES- cerevisiae-. YJROIOcJ fle-EO 
EFUNCAT3 Oa-n cellular import ES- cerevisiae-. YJROIOcl Be-EO 
EFUNCAT3 03-EE cell cycle control and mitosis ES- cerevisiae-. 

15 YJRO^OcJ fle-EO 

EFUNCAT1 03-04 budding, cell polarity and filament formation 

ES- cerevisiae-. YJR010c3 fle-EO 
EFUNCAT3 01-05-04 regulation of carbohydrate utilization ES- 
cerevisiae-. YJROIOcl fle-EO 

20 EFUNCAT]) 11-04 dna repair (direct repair-, base excision repair 
and nucleotide excision repair) ES- cerevisiae, YJROSEwJ 3e-07 
EFUNCAT3 30-10 nuclear organization ES- cerevisiae, YJROSEuO 
3e-0? 

EBL0CKS3 PROOOnB 

25 EBL0CKS3 PR003L4D 

EBL0CKS3 BP011E1A 

EBL0CKS3 BP03743B 

EPIRKU3 tandem repeat Ee-lfl 

EPIRKU3 zinc finger le-07 

30 EPIRKblJ DNA binding le-07 

ESUPFAM3 leucine-rich alpha-E-glycoprotein repeat homology Ee-lfl 



35 



ESUPFAPI3 regulatory protein ESAGfic le-07 
EKU3 Alpha_Beta 



SEfl NRTNRLLPRHFHNLflNLSLAYCRRFTDKGLflYLNLGNGCHKLIYLDLSGCTfllSVflGFRY 

PRD ccccccccccccccccceeeeeecccccccceeeecccccceeecccccccccccccccc 

40 SEfl IANSCTGiriHLTINDIIPTLTDNCVKALVEKCSRITSLVFTGAPHISDCTFRALSACKLRK 

PRD ccccccceeeeeccccccccchhhhhhhhhhhccccccccccccccccccccccccceee 

SEfl IRFEGNKRVTDASFKFIDKNYPNLSHIYMADCKGITDSSLRSLSPLKflLTVLNLANCVRI 

PRD eeecccccccccccccccccccceeeeeeeccccccchhhhhhhhcccccccceeeeeec 

45 

SEfl GDMGLKflFLDGPASHRIRELNLSNCVRLSDAFVMKLSERCPNLNYLSLRNCEHLTAflGIG 

PRD cccccccccccccccccceeeeccccccccccchhhhhhcccccccccccccccccccee 

SEfl YIVNIFSLVSIDLSGTDISNEGLNVLSRHKKLKELSVSECYRITDDGIflAFCKSSLILEH 

50 PRD eeccccceeeeeecccccccccchhhhhhcccccccccccccccchhhhhcccccccccc 

SEfl LDVSYCSflLSDflllKALAIYCINLTSLSIAGCPKITDSAriEriLSAKCHYLHILDISGCVL 

PRD cceeecccccchhhhhhhhccccceeeeeecccccchhhhhhhhhhccceeeeecccccc 

55 SEfl LTDfllLEDLfllGCICflLRILKtlflYCTNISICKAAflRMSSKVflaflEYNTNDPPRUFGYDREGN 

PRD chhhhhhhhhhcchhhhhhcceeeeechhhhhhhhhhhhheeeccccccccccccccccc ' 

SEfl PVTELDNITSSKGALELTVKKSTYSSEDflAA 
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PRD ccccccccccccceeeeeccccccccccccc 

(No Prosite data available for DKFZphtes3_2£lEi4 -E) 
(No Pfam data available for DKFZphtes3_2E12M .E) 
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5 group: testis derived 

DKFZphtes3_2bg3 encodes a novel 1CH0 amino acid protein without 
similarity to known proteins. 

10 No informative BLAST resultsi No predictive prosite-i pfam or SCOP 
motif e- 

The new protein can find application in studying the expression 
profile of testis-specif ic genes- 

15 

similarity to Oelegans CDTDH • 4 

on genomic level encoded by HSDJIIAIT 
20 perhaps complete cds. 

Sequenced by EMBL 

Locus: /map="t" 

25 

Insert length: 45b2 bp 

Poly A stretch at pos- 4550i polyadenylation signal at pos- 4515 



30 1 GATTCAGTTA CTGAAGACTT AGATGCACCC TGGATGGGAA TTCAGAATCT 

SI TCAGAGATCA GAGTCCAGTA AAATGGATAA ATATGAGACT GAAGAAAGCT 

101 CTGTAGCAGG ACTTTCTAGC CCAGAGTTGA AAGTCAGACC TGCTGGTGCC 

151 TCCAGTATTT GGTATACAGA AGGTGAAAAG CAGCTAACAA AATCTCTAAA 

201 AGGAAAGAAT GAAGAATCAA ATAAATCCAA AGTTAAGGTT ACTAAGCTTA 

35 251 TGAAAACAAT GAAATCTGAA AACACAAAAA AATTAATAAA ACAGAACTCT 

3D1 AAGGATTCTG TGGTTTTGGT AG6CTACAAA TGTTTGAAAA GTACAGCATC 

3S1 AAATGATCTC . ATTAAAT6CT TTGAAGGCAA TCCTTCACAT AGTCAGAAGG 

401 AAGGTCTGGA TCCCACAATA TGTGGATATA ATTTT6ACCC AAAGACCTAC 

451 ATGAGACAGA CAAGTCAAAA GGAAGCTAGC TGTTTGCCAA CTAATACAGA 

40 501 GAGAACTGAA CAAAAGTCTC CAGATATTGA AAATGTTCAA CCAGACCAGT 

551 TTGATCCTTT GAACTCTGGC AACCTAAATC TTT6TGCAAA TTTGTCCATT 

bOl TCAGGTAAAC TTGATATCTC CCAGGACGAT AGTGAAATTA CACAAATGGA 

t51 ACACAATCTG GCATCCAGAA GGTCATCAGA CGATTGCCAT GATCATCAAA 

701 CAACCCCATC TTTGGGAGTT AGAACAATTG AAATAAAGCC CAGTAATAAA 

45 751 GATCCTTTCA GTGGAGAGAA TATAACTGTC AAACTAGGAC CTTGGACAGA 

flOl GCTTCGACAA GAGGAAATAC TTGTGGATAA TTTACTACCC AACTTTGAGT 

flSl CCTTAGAATC TAATGGTAAA TCTAAATCTA TAGAAATAAC ATTTGAAAAG 

=101 GAAGCTTTGC AAGAAGCAAA GTGTCTTTCT ATTGGAGAAT CATTAACTAA 

151 ATTACGAAGT AATCTACCTG CCCCTTCTAC AAAAGAATAT CATGTTGTAG 

50 1001 TAAGTGGAGA TACAATTAAG TTACCAGATA TTAGTGCCAC ATATGCCTCA 

1051 TCTAGATTTT CAGATTCAGG TGTTGAAAGT GAACCGAGTT CTTTTGCGAC 

1101 ACATCCAAAC ACTGATTTAG TCTTTGAAAC TGTGCAAGGG CAAGGTCCTT' 

1151 GCAATAGTGA AAGATTATTT CCTCAGCTTT TGATGAAACC TGATTATAAT 

1201 GTAAAATTTT CATTAGGAAA TCATTGTACT GAGAGTACAA GTGCTATAAG 

55 1251 TGAAATACAG TCATCTTTGA CATCCATAAA CTCTCTACCC TCCGATGATG 

1301 AACTGTCACC TGATGAAAAT TCTAAGAAAT CTGTTGTACC TGAATGCCAT 

1351 CTAAATGATA GCAAAACTGT ATTAAATCTA GGAACGACTG ATTTGCCAAA 

1401 ATGTGATGAT ACTAAAAAGT CAAGTATCAC TTTGCAACAG CAGAGTGTTG 
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1451 TATTTTCAGG GAACTTGGAC 
1501 TCAAGCATTA AAGACCCTTT 
1551 CAGTGATGTG AAAAGTAGTT 
lbOl GTAAAGGCTT CCAGAGTCCT 
5 IbSl ATTACATTAA ATTCAAAACT 
1701 AGGTTCCATT TCTAGTAATA 
1751 AAAATAGTGA TGTATTAAAT 
lfiOl GTTGAAAGTG AAACTCATCT 
1651 TGATATAGTA AAGCAAGGGC 

10 1101 GTACGGATAT TTCTGACACA 
1151 CCTCAGAAGG AAACTTCTGA 
2001 GGATAAAGAG GATGAGGAGG 
SD51 GGTACTATGA AGAAACAGAT 
2101 CACTATACAA GCAGAGATGA 

15 2151 AAAAATAAAC AGTGACTATC 
2SD1 GTACTTCTGG TTGTTTGTCC 
5251 AATGTTAAAT ATTCTTCCAA 
2301 AAGCAGTACT TCTTACAACT 
2351 CACCAAAACC TCAAATACAA 

20 2401 CTACTAAAAC TTCCTGGGTT 
2451 CTCAGTACCT TATTTTAGTG 
2501 TACATCTGAT TGTCTGTGTG 
2551 CGATTAGTAA AAACTTACAT 
2b01 TTTTCTTATG TCTGAGAGAA 

25 2b51 GCATGACTGA TCGTCTTTTG 
2701 A6TCTAACAG TCTCAAAAAT 
2751 AATAATTCGT TCAGTGCTTA 
2fi01 AACTTCATAC CTTTCTGTCT 
2fl51 AACAGCAGTG CTCTTGTTAA 

30 2101 AAAATCAGGT TCGCTTTTGC 
2151 GCCAAACTTT TTTATATAAG 
3001 AAAAATGTTG TGCTAGTGGG 
3051 CTCTGCCCGC ATTGAAATGT 
3101 GACAGATCTA TTCAGAAATG 

35 3151 AGCAAGGACT GTAATTTGGT 
3201 TACAGCTGAT TCACTCATTG 
3251 CGGAAATATT TTTAGAGAAA 
3301 CAATAGTATA AAAGCATTGT 
3351 TGTTTCAAAT AATGTATTAT 

40 3401 AAATATTTAT ACCTTTTTAT 
3451 GTGCTTTTTA AACATCAACT 
3501 TTTTTTTAAT TTTATCTTTT 
3551 CTACCTAAGT ATTTCAGTGA 
3b01 TTATTATTGG CTTTCCACAA 

45 3b51 CCATTATTGC TAGAATAGCA 
3701 TTATTTAGTT AATTATAAAT 
3751 TGGAAATAAA ATTATGGCTG 
3A01 TAGTTCTAAA ATACAACTTT 
3A51 AAACAGTGTA ATACAAGTTT 

50 3=101 ATTTCATGCC TATTAAAATA 
3151 CTCATACTGA CTTTTATTAC 
4001 ATAAAGATTT TTGAATGTTT 
4051 TTGCTAATTG GTATGTTGCT 
4101 TTTTTCATGG ACTTCCTTAT 

55 4151 TGAAATACTT TTATGAATTT 
4201 TGAACTAAAA AGTAATGTAA 
4251 AATAATTAAC TTTACATGTT 
4301 ATGGAGATGT TGAGTCTTTT 



AATGAAACTG TAGCAATACA TTCCTTAAAT 
ACAATTTGTT TTTTCAGATG AAGAGACTTC 
GCAGCTCCAA ACCTAACTTG GATACTATGT 
GATAAATCTA ATAACTCTAC AGGGACAGCA 
GATTTGTTTA GGCACTCCTT GTGTCATTTC 
CAGATGTTAG TGAAGATAGA ACTATGAAAA 
CTCACACAGA TGTATTCAGA AATCCCTACA 
GGGTACAAGT GATCCTTTTT CAGCCAGTAC 
TTGTGGAAAA TTATTTTGGT TCTCAAAGCA 
TGTGCTGTTA GCTACAGCAA TGCACTTAGC 
AAAAGAAATT AGTAATCTTC AGCAGGAACA 
AAGAGCAGGA TCAACAAATG GTTCAAAATG 
TATTCAGCTT TGGATGGAAC AATAAATGCT 
ACTAATG6AA GAAAGACTTA CAAAATCTGA 
TGAGAGATGG TATAAACATG CCTACTGTCT 
TTCCCGTCTG CACCACGAGA GTCTCCTTGT 
AAGTAAATTT GATGCCATTA CAAAGCAGCC 
TCACTTCTTC GATTTCCTGG TATGAAAGTT 
GCCTTCCTTC AGGCAAAAGA AGAACTGAAG 
CATGTACAGT GAAGTTCCTC TGCTGGCATC 
TAGAAGAAGA GGGTGGTTCT GAAGATGGAG 
CACGGTTTAG ATGGAAACAG TGCAGATCTC 
TGAACTTGGA TTGCCTGGGG GAAGAATTGA 
ATCAGAATGA TACTTTTGCT GATTTTGATA 
GATGAGATAA TACAGTATAT TCAGATATAT 
AAGCTTTATT GGACATTCGT TGGGCAATTT 
CAAGGCCAAG GTTTAAATAT TACCTCAACA 
CTTTCTGGAC CTCACCTTGG TACACTCTAC 
TACAGGTCTC TGGTTTATGC AGAAATGGAA 
AGCTGACATG TCGAGATCAC TCAGACCCTC 
CTTAGTAACA AAGCAGGGCT TCATTATTTC 
ATCCCTACAG GATCGCTATG TTCCTTATCA 
GTAAAACAGC TTTAAAGGAC AAACAGTCAG 
ATCCACAACT TGCTTCGACC CGTTCTGCAA 
TCGCTATAAT GTCATCAATG CATTGCCCAA 
GGAGAGCTGC ACATATAGCT GTTCTTGATT 
TTCTTTCTGG. TTGCTGCCCT CAAATATTTC 
TAGCGACTGG ACAATTACCT CATTCAACAA 
ATTAAAATGT AGATGCTGAT AAGTTCTAAG 
ATGGAAGATA ATTTATATCA TCCATGTTTA 
TTACTTTCTA 6GTAATGTGG CTGTGCAATA 
TACTTTTCTA TTACTTTTTC ATATATTTTG 
AACTTTAAGC CCATACCTGT GTCT6ATTGT 
TTCTTACATC AGACTACATT ATATTAGAGA 
TGGGATTTAA AATTTCTAAT ACTGGGGGTA 
TTTTCTTTTC ACATTTTACT GTGTTTTAAC 
CTACAATATA TTTTTTGAAA TCAACTTCTG 
ATCATACAAT CAAACCAGGT AGTTCATATA 
TCTATAAAGT CATTACTGTT GCTTAAACAT 
TATTTTCTAC TGGTGATTTC AACATTATTT 
TGGAAATGTT CCTGTACATG TTGGCAGCAG 
GAATGCCCTC TGCCTTGATT TGGTTGGATT 
TGAACTTTAT GACTACATTT TCTTTTAACT 
ATGTACATAA TAATTAAATG TTGAAATTTA 
AGATAATTTT TAAATATTGT TAAAATTTAT 
ATAAAATAAT TCATGTTAAA GATGGAACAA 
TGGTGATACA GATGCAAATG TTTTTGATAT 
GACTTTACTA AAGGTGCTGA ATAGCATTAA 
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4351 ATTCACTATT TTCCTTTTCT GTTTTACTTG TGAAAATAAA AATGCACTAA 

4401 GGTTGGGTAG AAGTTCTGTT TGCACTCACT AATTGTGACA GACAGAGGTT 

4451 TTTGTAAGTA TTTATTGTAC AATTGATGCA TGTT.TATTTT TAGCGTTGTT 

4501 ATTGCCTCTG GTGTTAATAA ATGAACAAAT GGCTATCTGG AGGAACAGCT 

5 4551 AAAAAAAAAA AA 



BLAST Results 



Entry HSDJllflll from database EMBLNEliI: 

Human I>NA sequence *** SEQUENCING IN PROGRESS *** from clone 
Score = 7ES1-I P = O.Qe+00-. identities = 14SS/14feil 



Medline entries 



20 

No Medline entry 



25 Peptide information for frame 1 



ORF from 34 bp to 3303 bp; peptide length: 101D 
Category: similarity to unknown protein 
30 Classification: no clue 

1 MGIfiNLfiRSE SSKMDKYETE ESSVAGLSSP ELKVRPAGAS SIUYTEGEKcJ 

51 LTKSLKGKNE ESNKSKVKVT KLMKTMKSEN TKKLIKflNSK DSVVLVGYKC 

101 LKSTASNDLI KCFEGNPSHS (3KEGLDPTIC GYNFDPKTYM RfiTSfiKEASC 

35 1S1 LPTNTERTEfl KSPDIENVflP DfiFDPLNSGN LNLCANLSIS. GKLDISdDDS 

E01 EITOMEHNLA SRRSSDDCHD HOTTPSLGVR TIEIKPSNKD PFSGENITVK 

ESI LGPUTELRflE EILVDNLLPN FESLESNGKS KSIEITFEKE ALflEAKCLSI 

3D1 GESLTKLRSN LPAPSTKEYH VVVSGDTIKL PDISATYASS RFSDSGVESE 

351 PSSFATHPNT DLVFETV<2G(2 GPCNSERLFP (2LLMKPDYNV KFSLGNHCTE 

40 401 STSAISEIfiS SLTSINSLPS DDELSPDENS <KSVVPECHL NDSKTVLNLG 

MSI TTDLPKCDDT KKSSITL<2<8<J SVVFSGNLDN ETVAIHSLNS SIKDPLflFVF 

501 SDEETSSDVK SSCSSKPNLD TMCKGFflSPD KSNNSTGTAI TLNSKLICLG 

551 TPCVISGSIS SNTDVSEDRT MKKNSDVLNL TflMYSEIPTV ESETHLGTSD 

tOl PFSASTDIVK UGLVENYFGS flSSTDISDTC AVSYSNALSP (3KETSEKEIS 

45 b51 NLflfiEflDKED EEEE(3D(2i3MV (2NGYYEETDY SALDGTINAH YTSRDELMEE 

701 RLTKSEKINS DYLRD6INMP TVCTSGCLSF PSAPRESPCN VKYSSKSKFD 

751 AITKflPSSTS YNFTSSISIrtY ESSPKPfllflA FLflAKEELKL LKLPGFMYSE 

601 VPLLASSVPY FSVEEEGGSE DGVHLIVCVH GLSGNSADLR LVKTYIELGL 

flSl PGGRIDFLMS ERNfiNDTFAD FDSMTDRLLD EII<3YI(2IYS LTVSKISFIG 

50 101 HSLGNLIIRS VLTRPRFKYY LNKLHTFLSL SGPHLGTLYN SSALVNTGLU 

1S1 FMcJKWKKSGS LLflLTCRDHS DPRCJTFLYKL SNKAGLHYFK NVVLVGSLflD 

1001 RYVPYHSARI EMCKTALKDK <2SG<2IYSEMI HNLLRPVLfiS KDCNLVRYNV 

1051 INALPNTADS LIGRAAHIAV LDSEIFLEKF FLVAALKYFfl 

55 



BLASTP hits 
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No BLASTP hits available 

Alert BLASTP hits for DKFZphtes3_2bg3-. frame 1 

5 No Alert BLASTP hits found 

Pedant information for DKFZphtes3_2bg3-. frame 1 

10 Report for DKFZphtes3_2bg3.1 

ELENGTH3 11D1 

CMliU ■ 122BM5 - SH 

15 Epll 5-12 

EHOflOLl TREMBL : CEAF211b_l gene: "CDTD 1 * . H n i Caenorhabditis 

elegans cosmid C01D4 • 2e-38 

CFUNCAT3 unclassified proteins IS. cerevisiae-. Y0R051C3 

2e-0b 

20 EBLOCKS J BLQ012DB 

CKliO Alpha_Beta 

CKU3 LOU COMPLEXITY b-72 '/. 



25 SE(2 DSVTEDLDAPUHGIflNLflRSESSKIIDKYETEESSVAGLSSPELKv'RPAGASSIIilYTEGEK 

SEG 

PRD ccccccccccceeeeechhhhhhhhhhccccccccccccccceeeeccccceeeccccch 

SEfl flLTKSLKGKNEESNKSKVKVTKLnKTMKSENTKKLIKflNSKDSVVLVGYKCLKSTASNDL 

30 SEG xxxxxxxxxxxxxxx 

PRD hhhhhhccccccccceeeehhhhhhhhhcccccceeecccccceeeeeeeeccccccccc 

SE<2 IKCFEGNPSHSflKEGLDPTICGYNFDPKTYMRflTSflKEASCLPTNTERTEflKSPDIENVfl 

SEG 

35 PR© eeeecccccccecccccccccccccccccccccccccccccccccccccccccccccccc 

■i 

SEfl i PDflFDPLNSGNLNLCANLSISGKLDISflDDSEITflflEHNLASRRSSDDCHDHflTTPSLGV 

SEG 

PRD ccccccccccceeecccccccccccccccccccchhhhhhhcccccccccccccccccee 

40 

SEfl RTIEIKPSNKDPFSGENITVKLGPUTELRflEEILVDNLLPNFESLESNGKSKSIEITFEK 

SEG 

PRD eeeeecccccccccccceeeccccchhhhhhhhhhhccccccccccccccceeeehhhhh 

45 SEfl EALflEAKCLSIGESLTKLRSNLPAPSTKEYHVVVSGDTIKLPDISATYASSRFSDSGVES 

SEG 

PRD hhhhhhhhhhhhhhhhhhhhccccccccceeeeecccccccccccccccccccccccccc 

SEfl EPSSFATHPNTDLVFETVflGflGPCNSERLFPflLLMKPDYNVKFSLGNHCTESTSAISEIfl 

50 SEG 

PRD ccccccccccceeeeeeeccccccccccccccccccccceeeeecccccccccchhhhhh 

SEfl SSLTSINSLPSDDELSPDENSKKSVVPECHLNDSKTVLNLGTTDLPKCDDTKKSSITLflfl 

SEG 

55 PRD cccccccccccccccccccccccccccccccccccceeecccccccccccccccceeecc 

SEfl flSVVFSGNLDNETVAIHSLNSSIKDPLflFVFSDEETSSDVKSSCSSKPNLDTUCKGFflSP 

SEG xxxxxxxxxx 
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PRD eeeeeecccccceeeeeeeccccccceeeeeccccccceeeccccccccccccccccccc 

SEO DKSNNSTGTAITLNSKLICLGTPCVISGSISSNTDVSEDRTNKKNSDVLNLTflriYSEIPT 

SEG 

PRD cccccccccccccceeeeeeeccceeeeecccccccccccccccccchhhhhhheeeeec 

SEfl VESETHLGTSDPFSASTDIVKflGLVENYFGSflSSTDISDTCAVSYSNALSPflKETSEKEI 

SEG 

PRD cccccccccccccccceeeeeeeeeeeecccccccccceeeeeecccccccccccccccc 

SEt2 SNLflflEflDKEDEEEEflDflflMVflNGYYEETDYSALDGTINAHYTSRDELriEERLTKSEKIN 

SEG • • • xxxxxxxxxxxxxxxx 

PRD cchhhhhcccchhhhhhhhhhcccccccccccccccceeeeccchhhhhhhhhhhhhccc 

15 SEfl SDYLRDGINnPTVCTSGCLSFPSAPRESPCNVKYSSKSKFDAITKflPSSTSYNFTSSISU 

SEG xxxxxxxxxxxx. 

PRD ccccccccccccccccceeecccccccccceeeecccccceeeeeccccccceeecceee 

SE<2 YESSPKPfllflAFLflAKEELKLLKLPGFIIYSEVPLLASSVPYFSvEEEGGSEDGVHLIVCV 

20 SEG xxxxxxxxx xxxxxxxxxxxx 

PRD ccccccccchhhhhhhhhhhhhccccceeeeeeeeeecccceeeeeccccccceeeeeee 

SEfl HGLDGNSADLRLVKTYIELGLPGGRIDFLI"ISERNflNDTFADFDSI"ITDRLLDEIIflYIflIY 

SEG 

25 PRD eccccccchhhhhhhhhhhccccccchhhhhccccccccccccchhhhhhhhhhhhhhhh 

SEfl SLTVSKISFIGHSLGNLIIRSVLTRPRFKYYLNKLHTFLSLSGPHLGTLYNSSALVNTGL 

SEG 

PRD hccccccccccccceeeeeeeeccccchhhhhhhhhhccccccccceeeeccccccccch 

30 

SEfl WFnflKWKKSGSLLflLTCRDHSDPRflTFLYKLSNKAGLHYFKNVVLVGSLflDRYVPYHSAR 

SEG 

PRD hhhhhhhhhheeeeeecccccccceeeeeeccccceeeeeeeeeeeccccccceeehhhh 

35 SEfl IEMCKTALKDKflSGfllYSEIIIHNLLRPvLflSKDCNLVRYNVINALPNTADSLIGRAAHIA 

SEG 

PRD hhhhhhhccccccchhhhhhhhhhhccccccccceeeeeeecccccccccchhhhhhhhh 

SEfl VLDSEIFLEKFFLVAALKYFfl 

40 SEG 

PRD hhhhhhhhhhhhhhhhhhccc 



45 



(No Prosite data available for DKFZphtes3_Sbg3.1) 
(No Pfam data available for DKFZphtes3_Ebg3.1) 
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5 group: signal transduction 

DKFZphtes3_21f 24 encodes a novel 52b amino acid protein with 
similarity to murine netla- 

10 The closely related mNETl activates signalling pathways in 
addition to those directly controlled by activated RhoA- The 
novel protein is expressed ubiquitously. 

The new protein can find application in modulation/blocking 
15 signalling pathways- 



similarity to netla (Hus musculus) 
20 perhaps complete cds. 
Sequenced by BNFZ 

Locus: /map="72-40 cR from top of Chr3 linkage group" 

25 

Insert length: 3551 bp 

Poly A stretch at pos- 3534-1 polyadenylation signal at pos- 3513 



30 1 CGCCGCCGCC CGGCATCGTG GAGCTGGGGC CCCCTTTTGC CTGGGAGTTT 

51 TGTAGTCGCC TAGGGTCAGC GGTGACATCC CAAAGGGCAG GCCCGGCAGC 

101 CGCCATGGTG GCCAAGGATT ACCCCTTCTA CCTCACGGTC AAGAGAGCGA 

151 ACTGCAGCCT GGAGCTACCC CCGGCCAGCG GTCCGGCCAA GGACGCTGAG 

2D1 GAGCCTAGTA ATAAACGGGT CAAACCCCTT TCCCGAGTCA CGTCGCTAGC 

35 251 AAACCTCATC CCGCCCGTGA AGGCCACGCC ATTAAAGCGC TTCAGTCAAA 

301 CCCTGCAGCG CTCCATTAGC TTCCGCAGTG AGAGCCGCCC TGACATCCTC 

351 GCCCCCCGAC CCTGGTCCAG AAATGCCGCC CCCTCGAGCA CGAAACGGAG 

401 AGATAGCAAG CTGTGGAGTG AGACCTTCGA TGTGTGCGTC AATCAGATGC 

MSI TTACATCCAA GGAAATCAAA CGTCAGGAGG CGATCTTTGA GCTTTCCCAA 

40 501 GGAGAAGAAG ACTTGATAGA AGACTTGAAA TTAGCAAAAA AGGCCTATCA 

551 TGACCCCATG CTGAAACTCT CCATAATGAC AGAACAAGAG TTGAATCAAA 

bOl TTTTTGGAAC ACTGGACTCT CTAATTCCTC TACATGAAGA GCTCCTTAGT 

b51 CAGCTTCGAG ATGTTAGGAA GCCTGATGGC TCGACTGAAC ATGTTGGTCC 

701 CATCCTCGTG GGCTGGCTCC CTTGCCTCAG CTCCTATGAT AGCTACTGCA 

45 751 GCAATCAAGT AGCCGCCAAA GCTCTGCTGG ACCACAAAAA GCAAGATCAC 

AD1 CGAGTCCAGG ATTTCCTACA GCGATGTTTA GAATCCCCCT TTAGCCGCAA 

flSl ACTAGATCTC TGGAATTTCC TCGATATTCC AAGAAGCCGC CTGGTAAAAT 

=501 ACCCTCTGCT TCTCCGAGAA ATCTTGAGGC ACACACCAAA TGATAATCCA 

151 GATCAGCAGC ACTTGGAAGA AGCTATAAAT ATCATTCAGG GAATTGTGGC 

50 1001 AGAAATCAAC ACCAAGACTG GTGAATCTGA ATGCCGCTAT TATAAAGAGC 

1051 GGCTTCTTTA CTTGGAAGAA GGCCAGAAAG ACTCCCTGAT CGACAGCTCT 

1101 CGAGTCTTGT GTTGTCATGG TGAACTGAAG AACAATCGGG GCGTGAAACT 

1151 GCATGTTTTC CTGTTCCAAG AAGTGCTTGT GATCACTCGA GCCGTCACCC 

1201 ACAATGAGCA GCTTTGCTAC CAGCTGTACC GTCAGCCAAT CCCCGTGAAA 

55 1251 GACCTCCTGC TGGAAGACCT CCAGGATGGA GAAGTGAGGC TGGGTGGCTC 

1301 CCTGCGAGGG GCATTCAGCA ACAATGAGAG AATTAAAAAC TTCTTCAGAG 

1351 TCAGTTTCAA AAATGGATCC CAAAGTCAGA CCCACTCGCT ACAAGCCAAT 

1401 GACACTTTCA ACAAACAGCA GTGGCTTAAC TGTATTCGTC AAGCCAAAGA 
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mSl AACAGTTTTG TGTGCTGCCG GGCAAGCTGG GGTGCTTGAC TCCGAGGGAT 

1SD1 CGTTCCTAAA TCCCACCACC GGGAGCAGAG AGCTACAGGG AGAAACAAAA 

1SS1 CTTGAGCAGA TGGACCAATC GGACAGTGAG TCAGACTGTA GTATGGACAC 

IbOl GAGTGAGGTC AGCCTCGACT GTGAGCGCAT GGAACAGACA GACTCTTCCT 

5 IbSl GTGGAAACAG CAGGCACGGT GAAAGTAAC6 TCTGACAGAA GCATGTGCAC 

1701 TTCGGGAAGC AGGCCTGCAT CTTACCTGTA CAGTATTTGC ATTCCACAGA 

1751 TGGAACGGTT TGGAGAAGCA CTTTTTCATA CTTTTGTGAA AGTATACATG 

IflOl TTGGCCCAGT CTCTCGTATC TGTACCTTTG TCCCTAGTAC TGTAACTGCC 

1351 AATCTGTCTG TGTAAGCTGG AATCTGTGGC AACTATTACC CTGTGTTGTA 

10 nQl TTTCCCAAGT GTCTGGATGG ATGGAGAGGT ACTCAAACAA GTTACTTTCA 

1151 GTTGTCCTGC TGGATTTTAA AAAAATAGAA AAAGAATCTC AAAACTACTG 

20D1 TTTTACATAG ATTGTTTGAA GAGTCCTTCC TCTTGTGCTT CTGTACCACT 

2051 TTCCCAGCTC TTAGATGTGG TAGCTAAAGG CACGGAATTT AGACGGCCTT 

2101 GTAAATAGGG CATGAGGAAC TCATCTGTGT ATTGGGATGG TATTAGAGAG 

15 2151 AGAATCAGGA AAGACCAACT CATGAAGTGA ACTTGGTTTG ATCTTACTCA 

2201 ACTAGAAAGC TT6AAAACAT CCCTGGGGAT TCTGAAGGCT TAATTTTGCA 

2251 AAGGAGGATG CATTGTCTGA ACTTTGCAAC TTCATCCAGT 6CAAGTTTGA 

2301 TGCAAGAATG TATTAGGACA TAAAATAGAG 6CTGACCTTA AAAGGGCCAG 

2351 GACAGAAGCG GCTGCCAGCT CTGAATCTTT AACTGAAATG CACATGGCAC 

20 2"401 CAGGAGGTGT CTCTCATAGT TGGTTGCTAG CCTAAAACAT CAGAATAGAA 

2»451 CCCAAAGGGC TTAGGAAGGC CTGCCAGGAT AACAAGAAGG CCCTGTATTC 

2501 ATTGTGTTTC ATCTGCCTAG GCCTACTCAT TATTTTAGAG AATGAATGAA 

2551 GCAACAAGGA AGAGAGACCA TGACTCTATC GATGACACTG TTTATAGAAA 

2b01 CACAGGAGAG GAAGAATTTG GAATGAAAAG CACTTCGTCA GAACCTTCTG 

25 2t51 TGGGAGCCAT TGAGAGAAAA GCATGGTCCA GTGCCTTCTG AGAAAGGCCA 

2701 GAGCTTTGGG CTTTCCTGCT CTGCTTTTGG GTCGTCAATT "TGCCATCTCT 

2751 GGTTCTGTGC TATAATCAGA ATTGTAATTA TGTTCTCCAG AGGCCAATTT 

2S01 CATTAACTCT GATTAATTAG AATCAGCTAG CCAGATTAGT AACCTCTTTG 

2351 TCCAGCCTTG ATTTACAGTG CAGGGTAAAG TGCAGACCTT AAAAACAGCT 

30 2101 AAGTACCTAG AAGAGCTCCC TGCAAGTGTA AATATTAAGG ATGACCTGTG 

2151 CAAAATTATA CCCACACCAG CACTAGTGGT AATTATTCTA AATTATTGCC 

3001 AAAAAGTTTT TTTTAATCTG TCTTTCAAGT TTACAGAAAA GAAAGCAGTA 

3051 AATGCATTGA TGTCATTTTA TTATGTACAT ATATCATGTG CATTCAAGCT 

3101 GTGTGACAAG ATATATCAAT ATAAAAACAA GGTATATACT TTATTATTTT 

35 3151 TTGAAAACAA GGATATTGTG ATCAATTTTA CCCTGTAAAA CATATTTCT.G 

32D1 TATTTATAGG TCTTAAACAT GATGAATTTT TTCTATTACA AGTTTATTTA 

3251 AAACTGCTTT CTCAAGTCGT TATTGATACA GCAAGTGAAC CTGCTGCAGA 

3301 CAGAAGCAGA GGAAAGCCAA GAACAGCCTT TATTGGTGAA GAAAAGAATG 

3351 AATGATTCTT TGTAGGCGCC ATCAGCCACT TTTAGAAGCC ATCAGCCAGT 

40 3M01 GTGTTGGGAA AAGAGGTTTG TCAAGTGTTG GCCTATGGGA AGGTGGTCAA 

3M51 TGAATGTTTT GATGAAATGA ATGTTTTTGT ATAATGGCCT TAAACTTTTC 

3501 TGGAAGTATT TCAAATAAAT TACATTATTA AGTCAAAAAA AAAAAAAAAA 
3551 AAAAAAAAA 

45 

BLAST Results 



No BLAST result 

50 

Medline entries 



55 ia33fc.l1fc»: 

Alberts ASi Treisman R.i Activation of RhoA and SAPK/JNK 

signalling 

pathways by the 
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RhoA-specif ic exchange factor mNETl- EHB0 J ma Jul 
15U7(m):4D75-fl5 



Peptide information for frame 3 

10 ORF from IDS bp to lb82 bpi peptide length: 52b 
Category: strong similarity to known protein 
Classification: Cell signaling/communication 

1 MVAKDYPFYL TVKRANCSLE LPPASGPAKD AEEPSNKRVK PLSRVTSLAN 

15 SI LIPPVKATPL KRFSflTLflRS ISFRSESRPD ILAPRPWSRN AAPSSTKRRD 

101 SKLWSETFDV CVNfiMLTSKE IKRflEAIFEL SflGEEDLIED LKLAKKAYHD 

151 PPILKLSIUTE (2ELNC3IFGTL DSLIPLHEEL LS(2LRDVRKP DGSTEHVGPI 

201 LVGWLPCLSS YDSYCSNUVA AKALLDHKKd DHRV(2DFL(2R CLESPFSRKL 

251 DLUNFLDIPR SRLVKYPLLL REILRHTPND NPDflflHLEEA INIK3GIVAE 

20 301 INTKTGESEC RYYKERLLYL EEGfiKDSLID SSRVLCCHGE LKNNRGVKLH 

351 VFLFCJEVLVI TRAVTHNEflL CYflLYRflPIP VKDLLLEDLfl DGEVRLGGSL 

401 RGAFSNNERI KNFFRVSFKN GS<2S<2THSL<2 ANDTFNKtidJU LNCIRCAKET 

451 VLCAAGC3AGV LDSEGSFLNP TTGSRELflGE TKLEflMDflSD SESDCSI1DTS 
501 EVSLDCERME CTDSSCGNSR HGESNV 

25 



BLASTP hits 

30 No BLASTP hits available 

Alert BLASTP hits for »KFZphtes3_2^f 24 -, frame 3 
No Alert BLASTP hits found 

Pedant information for DKFZphtes3_2Tf 24i frame 3 



35 



40 



Report for DKFZphtes3_21f 24 . 3 



[LENGTH! 5b0 
EllliD b3202-B5 



EpIJ b-04 

45 EH0H0L3 TREMBL: AFDT4 520_1 gene: "Netl"", product: "NETl 

homology Mus musculus NETl homolog (Netl) mRNA-i complete cds- 
le-lb2 

CFUNCAT3 01-01 biogenesis of cell wall IS- cerevisiae! 

YLR3?lw3 3e-lb 

50 EFUNCAT3 03-07 pheromone response! mating-type determination! 
sex-specific proteins IS. cerevisiae! YLR371w3 3e-lb 
CFUNCAT3 ID- 02-01 regulation of g-protein activity ES- 
cerevisiae! YLR371w3 3e-lb 

EFUNCATl 01-04 biogenesis of cy toskeleton ICS • cerevisiae! 
55 YLR371w3 3e~lb 

EFUNCAT3 03-04 budding! cell polarity and filament formation 
IS- cerevisiae! YLR371w3 3e-lb 
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[FUNCAT2 D1.0S.DW regulation of carbohydrate utilization IS • 
cerevisiaei YLR371uO 3e-lb 

[FUNCATH 30*. 03 organization of cytoplasm IS . cerevisiae-i 

YALD41w3 3e-ll 

5 [FUNCATJ [13-22 cell cycle control and mitosis [S. cerevisiae-i 

YALOmuO 3e-ll 

[FUNCAT]! 1D.0S.DT regulation of g-protein activity IS- 
cerevisiae-i YALOMluO 3e-ll 

[BLOCKS] PRD0510E 

10 [BLOCKS! PROODmE 

[BLOCKS! BLQ0741B 

[PIRKUJ breakpoint cluster region le-Db 

[PIRKliO transmembrane protein 5e-13 

[PIRKli)]! brain 3e-0b 

15 [PIRKUI signal transduction Se-13 

[PIRKUJ alternative splicing le-Qb 

[SUPFAfO CDCS4 homology =le-15 

[SUPFAM1 SHB homology le-ll 

[SUPFAMU CDC25-type guanine nucleotide exchange activator 

20 homology Ee-Ofl 

[SUPFAMJ dbl transforming protein le-Dfl 

[SUPFAH3 protein kinase C zinc-binding repeat homology le-ll 

[SUPFAPD SH3 homology le-ll 

[SUPFAro bcr protein le-Ob 

25 [SUPFAro pleckstrin repeat homology ae-11 

[SUPFAPU vav transforming protein le-ll 

[KIO All_Alpha 

30 SEA PPPGIVELGPPFAUEFCSRLGSAVTSURAGPAAAMVAKDYPFYLTVKRANCSLELPPASS 
PRD cccceeeeccccccchhhhhhhhhhhhhcccccccccccccceeeecccccccccccccc 

SE<2 PAKDAEEPSNKRVKPLSRVTSLANLIPPVKATPLKRFS<2TL(2RSISFRSESRPDILAPRP 

PRD cccccccccccccccccccccccccccccccccccchhhhhhcccccccccccccccccc 

35 

SE<2 WSRNAAPSSTKRRDSKLUSETFDVCVNflflLTSKEIKRflEAIFELSiSGEEDLIEDLKLAKK 

PRD cccccccccchhhhhhhhhhhcccccchhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhh 

SEU AYHDPriLKLSiriTEtJELNfllFGTLDSLIPLHEELLSflLRDVRKPDGSTEHVGPILVGIJLP 
40 PRD hhhchhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhccccccccccceeeeeccc 

SE(3 CLSSYDSYCSNflVAAKALLDHKKiJDHRVflDFLiJRCLESPFSRKLDLIilNFLDIPRSRLVKY 
PRD cccceeecccchhhhhhhhhhhhcchhhhhhhhhhhcccccccccccceeeccccccchh 

45 SE(2 PLLLREILRHTPNDNPDflflHLEEAINIIflGIVAEINTKTGESECRYYKERLLYLEEGflKD 
PRD hhhhhhhhhhccccccchhhhhhhhhhhhhhhhhhhhcccchhhhhhhhhhhhhhhcccc 

, SE<2 SLIDSSRVLCCHGELKNNRGVKLHVFLFUEVLVITRAVTHNEflLCYjJLYRflPIPVKDLLL 

PRD hhhhhhheeecccccccccccceeeeehhhhhhhhhhhchhhhhhhhhhhhccccccccc 

50 

SE<2 EDL(2DGEVRLGGSLRGAFSNNERIKNFFRVSFKNGS<3S(2THSL(3ANDTFNK(J(2ULNCIR(2 

PRD ccccccccccccccchhhhhhhhhhhhheeeeccccchhhhhhhhcccchhhhhhhhhhh 

SE(3 AKETVLCAAGt2AGVLDSEGSFLNPTTGSREL(2GETKLE<2f1Di2SDSESDCSMDTSEVSLDC 
55 PRD hhhhhhhhhccceeeeccccccccccccchhhhhhhhhhhhhhccccccccccccccccc 

SEfl ERUEflTDSSCGNSRHGESNV 
PRD cccccccccccccccccccc 



i" 
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(No Prosite data available for DKFZphtes3_2Tf 2M -3) 
5 (No Pfam data available for J>KFZphtes3_21f 2H • 3) 
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DKFZphtes3_3Dpb 



PCT/TO01/02050 



5 group: testis derived 

DKFZphtes3_3Dpb encodes a novel Mbl amino acid protein without 
similarity to known proteins. 

10 No informative BLAST resultsi No predictive prositei pfam or SCOP 
motif e • 

The new protein can find application in studying the expression 
profile of testis-specif ic genes. 

15 

similarity to Celegans FIIHIQ.I 
perhaps complete cds- 

20 

Sequenced by LMU 

Locus: unknown 

25 Insert length: bp 

Poly A stretch at pos- lllli no polyadenylation signal found 



1 GGAACAGACC ACTGGGCTGG CAGCTGAGTT GCAGCAGCAG CAGGCTGAGT 

30 51 ACGAGGACCT TATGGGACAG AAAGATGACC TCAACTCCCA GCTCCAGGAG 

101 TCATTACGGG CCAATAGTCG ACTGCTGGAA CAACTTCAAG AAATAGGGCA 

151 GGAGAAGGAG CAGTTGACCC AGGAATTACA GGAGGCTCGG AAGAGTGCGG 

E01 AGAAGCGGAA GGCCATGCTG GATGAGCTAG CAATGGAAAC GCTGCAAGAG 

251 AAGTCCCAGC ACAAGGAAGA GCTGGGAGCA GTTCGTCTAC GGCATGAGAA 

35 301 GGAGGTGCTG GGGGTGCGTG CCCGCTATGA GCGTGAGCTC CGAGAGCTGC 

351 ATGAAGACAA GAAGCGTCAG GAGGAGGAGC TCCGTGGGCA GATCCGGGAG 

l»D1 GAGAAGGCCC GGACACGGGA GCTGGAGACT CTCCAGCAGA CAGTGGAAGA 

MSI ACTTCAAGCT CAGGTACATT CCATGGATGG AGCCAAGGGC TGGTTTGAAC 

501 GGCGCTTGAA GGAAGCCGAG GAATCCCTGC AGCAGCAGCA GCAGGAACAA 

40 551 GAGGAAGCCC TCAAGCAGTG TCGGGAGCAG CACGCTGCCG AGCTGAAGGG 

bDl CAAGGAGGAG GAGCTACAGG ATGTACGGGA TCAGCTCGAG CAGGCCCAGG 

bSl AGGAGCGGGA CTGCCACCTG AAGACCATTA GCAGCCTGAA GCAGGAGGTG 

701 AAGGACACAG TGGATGGGCA GAGGATCCTG GAGAAGAAGG GCAGTGCTGC 

751 GCTCAAGGAC CTCAAGCGGC AGCTGCATTT GGAGCGGAAA CGGGCAGATA 

45 SDl AGCTGCAGGA GCGACTGCAG GACATCCTCA CTAACAGCAA GAGCCGCTCA 

S51 GGCCTTGAGG AGCTGGTTCT CTCAGAGATG AACTCACCAA GCCGGACCCA 

101 GACAGGGGAC AGCAGTAGCA TCTCCTCCTT CAGCTACCGG GAGATCTTGC 

151 GGGAAAAGGA GAGCTCGGCT GTTCCAGCCA GGTCCTTATC CAGCAGCCCT 

1001 CAAGCCCAGC CCCCTCGGCC AGCAGAGCTG TCAGATGAGG AAGTGGCTGA 

50 . 1051 GCTCTTTCAG CGGCTGGCAG AGACACAGCA GGAGAAATGG ATGCTGGAGG 

1101 AGAAGGTGAA GCACCTGGAA GTGAGCAGTG CTTCCATGGC AGAGGACCTC 

1151 TGCCGGAAGA GCGCCATCAT TGAGACCTAC GTCATGGACA GCCGGATCGA 

1S01 TGTGTCTGTG GCAGCAGGCC ACACAGACCG CAGCGGGCTG GGCAGCGTCC 

1251 TGAGAGACCT AGTGAAGCCA GGCGACGAGA ACCTTCGGGA GATGAACAAG 

55 1301 AAGCTGCAGA ACATGCTGGA GGAGCAGCTC ACCAAGAATA TGCACTTGCA 

1351 CAAGGATATG GAAGTTCTGT CCCAGGAAAT TGTGCGGCTC AGCAAGGAGT 

1101 GCGTGGGGCC TCCTGACCCA GACCTAGAGC CAGGAGAAAC CAGCTAAAGA 

mSl CCTGCAGGCT GCACCCACCT CCTCCCCTTC CTACCCCCTA GGATGCTATT 
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10 



20 



WO 01/98454 

1S01 CCCTTGGGCT 

1SS1 TGGGAGACTG 

lbOl ATCCTGTCTT 

lbSl GGCAGGGATT 

17D1 TCATCTCTGC 

1751 TTTTATTTTT 

1601 GGGGATGCTG 

1SS1 GTGTGTGTGT 

1101 CGGGCCCACC 



GTGGTGGAAA 
GACATTAAAG 
AGGGCAGAGG 
TCTCCTTCTT 
ATGAGCTCTC 
TAATTTATGT 
GGTGGGTGTG 
GTGTGTAAAG 
CACAAAAAAA 



AATGAGGGCT 
GGGCTAGAGG 
CCACCAGGGA 
CTTGGTCCTG 
CTTCCCAGAG 
CTGGAGCCTG 
TGGTCCATGT 
GCTATGCAGC 
AAAAAAAAAA 



GGAGCCAAAA 
CCTGATGGTT 
GTGGGGATCC 
GCTCCCAAGG 
ACCAACTCTT 
GCTACTCTGC 
TCAGCGTTCT 
CAAAATACCA 
AAAAAAAAAA 



BLAST Results 



15 No BLAST result 



Medline entries 



PCT/1B01/02050 

TCAAATAGCT 
AGTGTTAATG 
TGAGGGAAGG 
GCTTCTGTCT 
TTTTATTTTA 
ATTTGGGATT 
AGCAACACGT 
TCTGGCCAGA 
AAAG 



No Medline entry 



25 



Peptide information for frame 2 



30 



35 



40 



ORF from b2 bp to mMH bpi peptide length: m,l 
Category: similarity to unknown protein 
Classification: no clue 



1 MG<2KDDLNS<2 
51 AMLDELAMET 
101 KR(2EEELRG<3 
151 EAEESLflflfld 
201 CHLKTISSLK 
251 RLdDILTNSK 
301 SSAVPARSLS 
351 HLEVSSASMA 
101 VKPGDENLRE 
MSI PDPDLEPGET 



L<2ESLRANSR 
LflEKSCHKEE 
IREEKARTRE 
c2E<2EEALK<3C 
(2EVKDTVDG13 
SRSGLEELVL 
SSPtfAdPPRP 
EDLCRKSAII 
MNKKLlJNMLE 
S 



LLE<2L(2EIG<2 
LGAVRLRHEK 
LETLfiflTVEE 
RECHAAELKG 
RILEKKGSAA 
SEMNSPSRTfl 
AELSDEEVAE 
ETYVMDSRID 
EflLTKNMHLH 



EKE(2LT(2EL<2 
EVLGVRARYE 
LfiAcJVHSMDG 
KEEELflDVRD 
LKDLKRflLHL 
TGDSSSISSF 
LF(2RLAETlJ<3 
VSVAAGHTDR 
KDMEVLSflEI 



EARKSAEKRK 
RELRELHEDK 
AKGUFERRLK 
<2LE(2A<2EERD 
ERKRADKLflE 
SYREILREKE 
EKUMLEEKVK 
SGLGSVLRDL 
VRLSKECVGP 



45 



50 



55 



BLASTP hits 

No BLASTP hits available 

Alert BLASTP hits for DKFZphtes3_30pbi frame 2 
No Alert BLASTP hits found 

Pedant information for DKFZphtes3_30pbi frame 2 

Report for DKFZphtes3_30pb -2 
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CLENGTH3 4fll 
EMWJ S531fl.l0 
EpIJ S-D7 

EHOMOLJ TREMBL : CEFM1H1D_M gene: n FHlHlD.1 n i Caenorhabditis 

5 elegans cosmid FMIHID- Ee-12 

CFUNCATl 30.03 organization of cytoplasm ES. cerevisiaei 
YDLOSfluO 5e-04 

EFUNCATH Ofl-O? vesicular transport (golgi network-i etc) ES. 
cerevisiaei YDL05fiw3 5e-04 
10 EBL0CKSJ BL01100D NNI1T/PNI1T/TEI1T family of methyltransf erases 
proteins 

CIC U]) All_Alpha 

EKUJ L0W_C0I1PLEXITY 1=1.13 '/. 

EKliO C0ILED_C0IL 40.1b '/. 

15 

SE<2 EflTTGLAAELflflflflAEYEDLnGflKDDLNSflLflESLRANSRLLEflLflEIGflEKEflLTflELfl 

SEG xxxxxxxxxxxxxxx xxxxxxxxxxx 

PRD ccchhhhhhhhhhhhhhhhhhhhhcchhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhh 
20 COILS 

■ . -CCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCC 

SEfl EARKSAEKRKAMLDELAnETLflEKSflHKEELGAVRLRHEKEVLGVRARYERELRELHEDK 

SEG x 

25 PRD hhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhh 
COILS 

CCCCCCCC 

SEfl KRflEEELRGfllREEKARTRELETLflflTVEELflAflVHSflDGAKGUFERRLKEAEESLflflflfl 

SEG xxxxxxxxxxxxxxxx xxxxxxxxxxx 

PRD hhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhcccchhhhhhhhhhhhhhhhhh 
COILS 

• CCCCCCCCCCCCCCCC 

SEfl flEflEEALKflCREflHAAELKGKEEELflDVRDflLEfiAflEERDCHLKTISSLKflEVKDTVDGfl 

SEG xxxxxxxxx 

PRD hhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhcccc 
COILS 

CCCCCCCCCCCCC CCCCCCCCCCCCCCCCCCCCCCCCCCCCCCC 

SEfl RILEKKGSAALKDLKRflLHLERKRADKLflERLflDiLTNSKSRSGLEELVLSENNSPSRTfl 

SEG 

PRD cccccccchhhhhhhhhhhhhhhhhhhhhhhhhhhhhccccchhhhhhhhhhhccccccc 
COILS 

45 CCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCC 

SEfl TGDSSSISSFSYREILREKESSAVPARSLSSSPflAflPPRPAELSDEEVAELFflRLAETflfi 

SEG • • -xxxxxxxx xxxxxxxxxxxxxxxxxxxxx 

PRD cccccccchhhhhhhhhhhcccccccccccccccccccccchhhhhhhhhhhhhhhhhhh 
50 COILS 



SEfl EKblMLEEKVKHLEVSSASIIAEDLCRKSAIIETYVIIDSRIDVSVAAGHTDRSGLGSVLRDL 

SEG 

55 PRD hhhhhhhhhhhhhhchhhhhhhhhhhhhhhcccccchhhhhhhccccccccccccccccc 
COILS 



-370- 



WO 01/98454 PCT/IB01/02050 

SE<2 VKPGBENLREriNICKL(3NI1LEE(3LTKNI1HLHKDHEVLS(3EIVRLSKECVGPPDPDLEPGET 

SEG 

PRD ccccchhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhcccccccccccccccc 
COILS 

CCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCC 



sEta s 

SEG . 
PRD c 
10 COILS 



(No Prosite data available for DKFZphtes3_30pLi • 5) 
15 (No Pfam data available for DKFZphtes3_3Dpb -2) 
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DKFZphtes3_31al0 



PCT/IB01/02050 



5 group: nucleic acid management 



DKFZphtes3_31alO encodes a novel 542 amino acid protein with 
similarity to histone HI of Drosophila hydei. 

10 Histone HI variants are known to act as specific regulators of 
genes via the differential condensation of DNA ■ 

The new protein can find application in modulating/blocking the 
transcriptional activity and in expression profiling- 

15 

weak similarity to Drosophila histone HI 
perhaps complete cds- 

20 

Sequenced by LflU 



Locus: /map="13" 



25 Insert length: 283? bp 

Poly A stretch at pos- 2fl55-» polyadenylation signal at pos- 2331 



1 AGATGATCCC 

30 SI AAAACAATAG 

101 GCTAACATGC 

1S1 TCAGTCTAAG 

201 GTTCTGCAGC 

251 CCTCAGCCTG 

35 3D1 CAATATGACT 

351 AACTTGTGCG 

401 GTGAAACAAG 

451 GCCTCATGAA 

501 AAACCAGTTC 

40 551 ATAGCATCTG 

bOl GATGGAAAAG 

t.51 CAATTGTTGA 

701 AAAGCTCGTC 

751 GCCCCCTAAT 

45 flOl AACTAGTTGG 

fi51 TTATTTACTG 

101 TAATGAGGGA 

151 TTAAAAATAT 

1001 GCACTTATTG 

50 1051 TGAGAAAGCC 

1101 CGATTGTAGA 

1151 GAAAATATGG 

1201 TATTGAAGAT 

1SS1 GTAAACTTCA 

55 1301 AACAAAACAA 

1351 GAGGACAAGT 

1401 AAAGTGTGAA 

1451 GAGCTGAAGT 



CAAAGTCAAC ATATGACATT 
TAAAAAGAAA CAAATGACTA 
CCAAGAAACC TGTGCTTGGA 
ATTAATTCAT TTAGAAAACC 
AACAAAGAAA CTTTCAGCCA 
TAAACACCAG CAGTGTAACA 
GCCACTACTA AATTTGTGAG 
ACCTCCTATT AGAAGTCATC 
GCATCAGTAG AACCTCTGCC 
AAAGAACTAT TACAATCAAA 
TTCTCAAGGT ATAATAAGAA 
AAGTTGTAGC CAGGCCTGCT 
TCAGAGCCCG TTGACCAGCG 
TAGTAGATCA GCTCAGCCCA 
TGAGTGAGTG GAAAGCTGGC 
TCAGTAGTTA CTCAGCATGA 
GTCTTTTTGG ACTACCATGG 
AAAAAGTAAA CAACACATTT 
TGTCCAAAAG AAGATATACT 
TCCAGATGCC AAAAAGCTTG 
AACCAATCAC AAGTCCTATT 
ATTCTGGCAG GGGCTCAGCC 
TATTCTAACA ATGAAGAGTC 
AGAAGTCTTG TGCAAGCAAG 
ACAGGTGTTG ATGTAGATCC 
TAGAAATTTG CTATTTCAAG 
AAGATCCAAC CCATGATGTT 
TGCTTAATTA AATATAATGT 
AAAAAAGGTG CAGTTTGATG 
TTTTAACACC AGTGAGACGT 



AAGCCAGGCA TTTCACCTTA 
CAGAAAAACA AAAGCAAGAT 
TCTTATCGTG GCCAGATTGT 
TCTACAAGTC AAAGATGAGA 
CTATACCTAA AGCCACAAAA 
GTGAAAAGTA ATAGATCCTC 
CACTACATCT CAGAACACAC 
ACAGTAATAC CCGGGACACT 
AATGTTACAA TCCGGAAAGG 
AACAGCTTTA TCTAGTGTCA 
ATAAGACTCT ATCAAGATCC 
TCATTGTCTA ATGATAAACT 
AAGACATACT GCAGGAAAAG 
AAGAAACCTC GGAAGAGAGA 
AAAGGAAGAG TGCTAAAAAG 
GCCTGCAGGA CAAAATGAAA 
CAGAAGAAGA TGAACAAAGA 
TCTGAATGCC TGAACTTGAT 
GGTCACACTG AATGACCTGA 
TTAAGTATTG GATATGTCTT 
GAAAATATTA TTGCAATCTA 
TATTGAAGAG ATGCGACACA 
AAGAAAAAGC TAATTTAGGA 
GAAGAAGTCA AAGAAGTCAG 
AGAAAAACTG GAAATGGAGA 
ATTGTGAAAA AGAGCAAGAC 
AAAACCCCCA ATACAGAAAC 
GTGTACTACG CCATACTTGC 
GAACAAATTC CGCATTTAAA 
TCTCGACGTC TTCAAGAGAA 



-372- 



WO 01/98454 PCT/IB01/02050 

1SD1 AACTTCTAAA TTGCCAGATA TGTTAAAAGA TCATTATCCT TGTGTGTCTT 

1SS1 CATTGGAACA GCTAACGGAG TTGGGAAGAG AAACTGATGC TTTTGTATGC 

lfc.01 CGCCCTAATG CAGCACTGTG CCGGGTGTAC TATGAGGCTG ATACAACATA 

1L51 AGAGAAATAA AGCTCTGTTA GGGAAT6GGG TTTTTATTAT TTGTGGGGTG 

5 17D1 TTTTGTTTTG AGTAGCTTTA TATTGCTCTT AGGTCTGGAG TTGGCCATGT 

1751 ACCTATGTAT CCTAAGCATT CACGGCAGTG AGCTCCTTTA CTAACATTCA 

IflOl TGTTATGGCA AGAGTTGTCC TCTACATTGG AAAGCTAATC CTACCTTGTC 

1651 AGTTTCAACC AACTGAGTTT TTTCTTTAAG AAAGGTAAAT TTTGTCAGCT 

nOl AGTTTACTAT GTTCCTTGAA TATAAACAGG TTATAATACT ACCCTGTTCA 

10 1T51 CTTTACTAAA TATAAGTACA GTAATGATGC ATAATTAGAA AATGAGGTAT 

E0D1 TCTAGGTAAA ATGTATGTTT GCCTTGACAT GTTTTTAAAA GTTATGATGT 

2051 ACCTCCCTGC CTTTAAACAG AATACTTTTT TCTTTTTTTT GGCCTTTCTC 

B1D1 AGATTAGTCA AAAATTCTAT AGAATGACTC ACTTCGAATA CTAAGACACA 

2151 GGAGGTTTAG CCTGCTTTCT TACCAAATTC ATGTTACCCA GACTTGTGTT 

15 2201 CTCTTGCGTC CCTTGGACTG CCTGTTGATT GATGGAAAGT GTCTGCACTG 

2251 ACACTTTTCG TCAGTAGTCT GTAGTTTCGT GGCCTCTTTT GATTATAACT 

2301 GGGGTCACCA AGAAGGTTTA CTTAATTAAA TACCGCATTT CTAAGAGAAG 

2351 ATACTTTGTG TAAGAAAAGA TGCCACATTT AGTGGTTTAA CTTTTGTAAC 

2M01 TTCACTTGAT A6TTTTTAAG CAATTAGAAT GGAGTTAGGG AAAGAACATA 

20 2MS1 TCATACTGAA CAAATGTCAT TCTAGTTTAG ATAGCATTTC TAAGATAACT 

2501 GATACTAATA CTTGTTTTCT TCCCTATAAC ATAAAAAACT TCACTGTTAA 

2551 GTCATGTCCC TTGAAACATG ATAGTTACAT ACACAGTTTT CTCTCCACAC 

2b01 ATAAATAACA CCACTAAAGT TGTTTTGTAA GGTTCCAAAC TAATATGGCA 

2b51 TATATCAACT CTACAGTTTC AAATAAATGA CTTTTTAATT GTAAAAGATT 

25 2701 AGTTGAAAAA CTGTATGAAT GTGAAGATCA CATGCTTAGT CATTTTTATG 

2751 TTCATTCCAC TTTGTATATC TTTTCTATTT ATTGACTTC.T CATGTTCTAG 

2B01 AGAGTAGGAC TTTTATTCCG TGTACCTGAT ATATATACAA TTAAAATATC 

2B51 TGTATAATTA AAAAAAAAAA AAAAAAAAAA AAAAAAG 



30 



35 



45 



50 



BLAST Results 



No BLAST result 

Medline entries 

40 No fledline entry 



Peptide information for frame 2 



ORF from 23 bp to IbMfl bp} peptide length: 5M2 
Category: similarity to known protein 
Classification: unclassified 



1 MTLSfiAFHLK NNSKKKiJUTT EKdJKflDANMP KKPVLGSYRG (JIVflSKINSF 

51 RKPLflVKDES SAATKKLSAT IPKATKPflPV NTSSVTVKSN RSSNMTATTK 

101 FVSTTSflNTfl LVRPPIRSHH SNTRDTVKCG ISRTSANVTI RKGPHEKELL 

151 (3SKTALSSVK TSSSUGIIRN KTLSRSIASE VVARPASLSN DKLMEKSEPV 

55 201 DflRRHTAGKA IVDSRSAdPK ETSEERKARL SEUKAGKGRV LKRPPNSVVT 

251 <2HEPAGfiNEK LVGSFWTTMA EEDEfiRLFTE KVNNTFSECL NLINEGCPKE 

301 DILVTLNDLI KNIPDAKKLV KYUICLALIE PITSPIENII AIYEKAILAG 

351 A(3PIEEHRHT IVDILTMKSfl EKANLGENNE KSCASKEEVK EVSIEDTGVD 
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M01 VDPEKLEflES KLHRNLLFC3D CEKEflDNKTK DPTHDVKTPN TETRTSCLIK 

YNVSTTPYLfl SVKKKVflFDG TNSAFKELKF LTPVRRSRRL (2EKTSKLPDI1 

501 LKDHYPCVSS LEULTELGRE TDAFVCRPNA ALCRVYYEAD TT 



BLASTP hits 

No BLASTP hits available 

Alert BLASTP hits for DKFZphtes3_31alD-. frame E 
No Alert BLASTP hits found 
15 Pedant information for DKFZphtes3_31alCh frame 2 



10 



20 



55 



Report for DKFZphtes3_31al0.2 



CLENGTH3 SIT 

£ Hid J bib??. 3b 

EpIJ T.33 

EKIO Alpha_Beta 

25 EKliO L0ld_C0MPLEXITY V. 

SE<3 DI>PUS(JHf1TLSl3AFHL<NNSKKK(2l1TTEK(JKi2DANriPKKPVLGSYRGlJIV(3SKINSFRKP 

SE6 xxxxxxxxxxxx 

30 PRD ccccccchhhhheeeeccccccchhhhhhhhhccccccccccccccceeeeccccccccc 

SE<2 LtfVKDESSAATKKLSATIPKATKPtfPVNTSSVTVKSNRSSNriTATTKFVSTTSflNTlJLVR 

SEG 

PRD cccccchhhhhhhhhhccccccccccccceeeeeccccccccccceeeeeccccceeeec 

35 

SE<2 PPIRSHHSNTRDTVKfiGISRTSANVTIRKGPHEKELLflSKTALSSVKTSSStJGIIRNKTL 

SEG 

PRD cccccocccccccccccccccceeeeeccccchhhhhhhhhhcccccccccceeecccch 

40 SEfi SRSIASEVVARPASLSNDKLMEKSEPVDfiRRHTAGKAIVDSRSAflPKETSEERKARLSEti) 

SEG 

PRD hhhhhheeeecccccchhhhhhhcccchhhhhhcceeecccccccccchhhhhhhhhhhh 

SE<3 KAGKGRVLKRPPNSVVTCJHEPAGdNEKLVGSFIJTTn'AEEDEflRLFTEKVNNTFSECLNLI 

45 SEG 

PRD hcccceeeeccccceeeeccccccceeeeeecchhhhhhhhhhhhhhhhccccccceeec 

SEfl NEGCPKEDILVTLNDLIKNIPDAKKLVKYUICLALIEPITSPIENIIAIYEKAILAGAflP 

SEG , 

50 PRD ccccccceeeeecccceeecccchhhhhhhhhhhhcccccccchhhhhhhhhhhhhcchh 

SE(2 IEEilRHTIVDILTMKSCJEKANLGENMEKSCASICEEVKEVSIEDTGVDVDPEKLEIIESKLH 

SEG 

PRD hhhhhhhhhhhhhhhhhhhhhccchhhhhcccccceeeeeecccccccccchhhhhhhhh 



SE(3 RNLLFiJDCEICEiSDNKTKDPTHDVKTPNTETRTSCLIICYNVSTTPYLfiSVKICKVflFDGTNS 
SEG 

PRD cccccccccccccccccccccccccccccccceeeeeeecccccchhhhhhheeecccch 
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SE<2 AFKELKFLTPVRRSRRLdEKTSKLPDriLKDHYPCVSSLEflLTELGRETDAFVCRPNAALC 
SEG 

PRD hhhhhhhchhhhhhhhhhhhhhccccccccccccchhhhhhhhhccccceeeecccceee 

5 

SE<2 RVYYEADTT 
PRD eeeeccccc 

10 

(No Prosite data available for DKFZphtes3_31al0.2> 
(No Pfam data available for DKFZphtes3_31alD - E) 
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5 group: signal transduction 

DKFZphtes3_31 jED encodes a novel 312 amino acid protein that 
contains a Protein phosphatase EC motif- 



10 The novel protein shares identity withthe rat protein 

phosphatase EC and is expressed ubiquitously. PPEC is a 
structurally diversified protein phosphatase family with a wide 
range of functions in cellular signal transduction- The 
transcription of the PPECdelta gene was activated in response to 

15 stress-, like alcohol or UV irridation- PPEC plays a role in cell 
cycle control- 

The new protein can find application in and the diagnosis/therapy 
of stress related diseases and cancer-, as well as a for 
20 modulation of cell cycle and signal transduction- 



strong similarity to protein phosphatase EC (Rattus norvegicus) 

25 Sequenced by LMU 

Locus: unknown 

Insert length: lM3b bp 
30 Poly A stretch at pos- 13b?-. polyadenylation signal at pos- 13M1 



1 CGCTGCTCGC 

51 CCGCCATGGA 

35 101 CCGGCTGCCG 

151 CCCTCCGGCC 

501 ATGATCTCCC 

ESI ATATCCCAGA 

301 CGAGGAAGAG 

40 351 AAGCCTCTTC 

M01 GGTGAGAGGG 

M51 CGAGGAGTGT 

501 CTGTTTTTGA 

551 AATTTGCATC 

45 fc.01 TGTAGAGAAA 

b51 ATGAAGAGTT 

701 GGGTCCACTG 

751 CAACCTCGGA 

601 AAAAACATGC 

50 SSI GAAGAGCGGA 

=101 TGTTTTGGGC 

151 AGCGCTGCGG 

10D1 CCCAATGACA 

1051 TACCCCAGAA 

55 1101 AGATCCAGAC 

1151 GCCTGCAACA 

1S01 CGTCACTGTG 

1E51 AGGAGCACGC 



GGGCTGAGTG TCTGTCGCTG 
CCTCTTCGGG GACCTGCCGG 
GGAAAGAAGC TCAGAAAGGA 
AGCAGTACT6 ACTCAGGATC 
ACCCGCTAGC AGTGGCGATT 
TGGTAAAGAC TGAAGGGAAA 
AAGAATGGCA GTGAAGAGCT 
GGTGATCTTT GGTCT6AAGG 
AGGAGATGCA GGATGCCCAC 
AGGCCCCCAT CGTCCCTCAT 
TGGACATGGA GGAATTCGAG 
AAAACTTAAT CAGAAAATTT 
ACCGTGAAGA GATGCCTTTT 
CCTTAAACAA GCTTCCAGCC 
CCACGTGTGT TCTGGCTGTA 
GATAGTCGGG CAATCTTGTG 
AGCCTTAAGC CTCAGCAAAG 
TGAGGATACA GAAGGCTGGA 
GTGCTAGAGG TGTCACGCTC 
TGTCACCTCT GTGCCCGACA 
GGTTCATTTT GTTGGCCTGT 
GAAGCCGTGA ACTTCATCTT 
CCGGGAAGGG AAGTCCGCAG 
GGCTGGCCAA CAAGGCGGTG 
ATGGTGGTGC GGATAGGGCA 
ATGGTATTGA CTTAAAAGGT 



CTGCCGCCTC CACCCA6CCT 
AGCCCGAGCG CTCGCCGCGC 
CCCCTGCTCT TTGAT6ACCT 
AGGGGGACCT TTGCTTTTTG 
CAGGTTCTCT TGCCACATCA 
GGAGCAAAGA GAAAAACCTC 
TGTGGAAAAG AAAGTTTGTA 
GCTATGTGGC TGAGCGGAAG 
GTCATCCTGA ACGACATCAC 
TACTCGGGTT TCATATTTTG 
CCTCAAAATT TGCTGCACAG 
CCTAAAGGAG ATGTAATCAG 
GGACACTTTC AAGCATACTG 
AGAAGCCTGC CTGGAAAGAT 
GACAACATTC TTTATATTGC 
TCGTTATAAT GAGGAGAGTC 
AGCATAATCC AACTCAGTAT 
GGAAACGTCA GGGATGGGCG 
CATTGGGGAC GGGCAGTACA 
TCAGACGCTG CCAGCTGACC 
GATGGGCTCT TCAAGGTCTT 
GTCCTGTCTC GAGGATGAAA 
CCGACGCCCG CTACGAAGCA 
CAGCGGGGCT CGGCCGACAA 
CTGAGGGGTG GCGCGCGGCC 
TCATTTTGTG TGTGTGCACA 
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1301 TTGTGTGTTT TGTGTACTCC TGTGGGACTC CCATGGTTGT AAATAAAGGT 
1351 TTCTCTTTTT TTTTCCTAAA AAAAAAAAAA AAAAAAAAAA AAAAAAAAAA 
1M01 AAAAAAAAAA AAAAAAAAAA AAAAAAAAAA AAAAAG 

5 

BLAST Results 



No BLAST result 

10 



Medline entries 



15 Il[]7i|31t: 

Tong Yn duirion Ri Shen SH-t Cloning and characterization of a 
novel 
• mammalian PP2C 
isozyme. J Biol Chem lTJfl Dec 2Si273(52) :352fl2- c i0 



25 



Peptide information for frame 2 



ORF from 5b bp to 1231 bpi peptide length: 3=12 
Category: strong similarity to known protein 
Classification: Protein management 
30 Prosite motifs: PP2C (117-155.) 



1 PIDLFGDLPEP ERSPRPAAGK EAflKGPLLFD DLPPASSTDS GSGGPLLFDD 

51 LPPASSGDSG SLATSISflMV KTEGKGAKRK TSEEEKNGSE ELVEKKVCKA 

35 101 SSVIFGLKGY VAERKGEREE MdDAHVILND ITEECRPPSS LITRVSYFAV 

151 FDGHGGIRAS KFAAflNLHdN LIRKFPKGDV ISVEKTVKRC LLDTFKHTDE 

201 EFLKflASSflK PAUKDGSTAT CVLAVDNILY IANLGDSRAI LCRYNEES<2K 

251 HAALSLSKEH NPTflYEERfIR IfiKAGGNVRD GRVL6VLEVS RSIGDGdYKR 

301 CGVTSVPDIR RCtJLTPNDRF ILLACDGLFK VFTPEEAVNF ILSCLEDEKI 

40 351 (2TREGKSAAD ARYEAACNRL ANKAVflRGSA DNVTVMVVRI GH 



45 



BLASTP hits 
No BLASTP hits available 

Alert BLASTP hits for DKFZphtes3_31 J20-, frame 2 
50 No Alert BLASTP hits found 

Pedant information for DKFZphtes3_31 j20i frame 2 



55 Report for DKFZphtes3_31 j20-2 

ELENGTH3 41D 
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imi 4475T.fi5 

EpI3 7-T5 

EH0I10L3 TREMBL : AFOTSTB?^ product: "protein phosphatase 

BC"n Rattus norvegicus protein phosphatase BC mRNA-i complete cds. 
5 D-D 

EFUNCAT3 03. Dl cell growth IS. cerevisiaen YDL00bw3 be-BS 

EFUNCAT3 10. 03. 13 key phosphatases IS- cerevisiaen YDL00bw3 

be-B5 

EFUNCAT3 QT.lb mitochondrial biogenesis IS- cerevisiaen 

10 YDL00bw3 be-BS 

EFUNCAT3 11.01 stress response IS- cerevisiaen YDL00bw3 be-BS 

CFUNCATJ 03- 04 budding-, cell polarity and filament formation 

IS. cerevisiaen YDL00bw3 be-B5 

CFUNCAT3 01.05-04 regulation of carbohydrate utilization ES. 
15 cerevisiaen YDL00bw3 be-B5 

EFUNCAT3 Tfl classification not yet clear-cut IS* cerevisiaen 

YERDfllc]] le-B3 

EFUNCAT3 unclassified proteins IS- cerevisiaen Y0R0 e i0c3 

le-lB 

20 IFUNCAT3 03-BB cell cycle control and mitosis IS- cerevisiaen 

YJL005w3 3e-10 

CFUNCAT3 03.10 sporulation and germination ES. cerevisiaen 

YJL005w3 3e-10 

EFUNCAT3 30-OB organization of plasma membrane IS. cerevisiaen 

25 YJL005w3 3e-10 

EFUNCAT3 01.03-10 metabolism of cyclic and unusual nucleotides 

IS. cerevisiaen YJL005w3 3e-10 

EFUNCAT3 10.04-03 second messenger formation IS- cerevisiaen 

YJL00Sw]l 3e-10 

30 EBL0CKS3 PR010B3F 

EBL0CKS3 PR00b?7D 

EBL0CKS3 BL0103BI 

EBL0CKS3 BL0103BH 

EBL0OCS3 BL0103BG 

35 [[BLOCKS! BL0103BC Protein phosphatase 2C proteins 

EBL0CKS3 BL0103BB Protein phosphatase BC proteins 

ESC0P3 dlabq 4.^6-1-1-1 Protein serine/threonine 

phosphatase BC EHuma le-107 

EEC3 3-1-3-43 ^Pyruvate dehydrogenase ( lipoamide) 3- 
40 phosphatase 3e-0T 

EEC3 3-1-3-lb Phosphoprotein phosphatase 7e-35 

EEC3 4-b.l-l Adenylate cyclase Be-11 

CPIRKU3 duplication 5e-ll 

EPIRKU3 tandem repeat fle-OI 

45 EPIRKU3 serine/threonine-specif ic phosphatase Be-B7 

CPIRKM3 magnesium be-Bb 

EPIRKU3 cAMP biosynthesis Se-11 

CPIRKU3 liver Se-B7 

EPIRKU3 leucine zipper le-Dfl 

50 EPIRKU3 mitochondrion 3e-0T 

EPIRKU3 phosphoric monoester hydrolase 7e-35 

EPIRKW3 phosphorus-oxygen lyase Be-11 

CSUPFAP13 leucine-rich alpha-B-glycoprotein repeat homology Be-11 

55 CSUPFAM3 yeast adenylate cyclase catalytic domain homology Be-11 

ESUPFAM3 kinase interaction domain homology 3e-ll 

ESUPFAI13 yeast adenylate cyclase Se-11 
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CPR0SITE3 PPEC 1 

EPFAfO Protein phosphatase EC 

CK LI ID Alpha_Beta 

5 

SEfl AARGLSVCRCCRLHPASAMDLFGDLPEPERSPRPAAGKEAflKGPLLFDDLPPASSTDSGS 
PRD ccceeeeeeeeccccccceeeecccccccccccccccccccccccccccccccccccccc 

SEfl GGPLLFDDLPPASSGDSGSLATSISflMVKTEGKGAKRKTSEEEKNGSEELVEKKVCKASS 
10 PRD ccceeeccccccccccccccccccccccccccccccccccccccccccccccccccccce 

SEfl VIFGLKGYVAERKGEREEMflDAHVILNDITEECRPPSSLITRVSYFAVFDGHGGIRASKF 
PRD eeeceeeecchhhhhhhhhhhhheeeeccccccccccccccceeeeeeeccccchhhhhh 

15 SEfl AAflNLHflNLIRKFPKGDVISVEKTVKRCLLDTFKHTDEEFLKflASSflKPAWKDGSTATCV 
PRD hhhhhhhhhhhcccccccchhhhhhhhhhhhhhhhhhhhhhhhhhhcccccccccceeee 



20 



30 



SEfl LAVDNILYIANLGDSRAILCRYNEESfiKHAALSLSKEHNPTflYEERflRIflKAGGNVRDGR . 

PRD eeccceeeeeccccceeeeeeccccccccceeeeecccccccchhhhhhhhcccceeeee 

SEfl VLGVLEVSRSIGDGflYKRCGVTSVPDIRRCflLTPNDRFILLACDGLFKVFTPEEAVNFIL 

PRD ccccceeeeccccccccccccccccccccccccccceeeeeecccccccccchhhhhhhh 



SEfl SCLEDEKIflTREGKSAADARYEAACNRLANKAVflRGSADNVTVflVVRIGH 
25 PRD hhhhhhhhhhhhcchhhhhhhhhhhhhhhhhhhhccccccceeeeeeccc 



Prosite for DKFZphtes3_31 j£0 - £ 
PSD1Q3E lfe5->17i{ PPEC PDOCOtmE 



35 Pfaro for DKFZphtes3_31 jEU - E 



HMI1_NAI1E Protein phosphatase EC 
40 HflPJ 

*GlCcl"lflGPRlilRMsMEDaHiaylNF pcnlDUUhiilFFGVFDGHg 

+++ +G R++M+DAH+ + ++ P++L ++ 

+++F+VFDGHG 

fluery ISA YVAERKG — EREEflflDAHVILNDITEECRPPSSLITR- 

45 VSYFAVFDGHG 173 

HUM GDflCSflklCgeHUHdll* 

G+++S++ +++H+ + 
fluery 171 GIRASKFAAflNLHflNL 161 

50 
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5. group: signal transduction 

DKFZphtes3_5kEE encodes a novel 455 amino acid protein with 
similarity to human paraneoplastic neuronal antigen HAL 

10 Antibodies against MAI where found in patients with 
paraneoplastic neurological disorders- The protein is 
predominantly expressed in testis and braini but ESTs are also 
found in liveri lung uterus and kidney- 

15 The new protein can find application in studying/therapy of 
paraneoplastic neurological disorders- 



20 



strong similarity to paraneoplastic neuronal antigen HA1 

Sequenced by fliagen 

Locus: unknown 

25 Insert length: 3534 bp 

Poly A stretch at pos- 3514i polyadenylation signal at pos- 34T4 



1 GAACGTCCGC 
30 51 GCCGCCGCCG 

101 AGGGACCTTG 
151 ACCCTAGGAG 
EDI AAATTAGTAT 
E51 ACTGGTGTCG 
35 301 GGGATCCCCG 
351 GGCTTGCAGG 
401 GGGAGGAGAA 
451 TATGCTTTGC 
501 GATTGTAAAA 
40 551 GCTTCTTAGA 
L01 GGGTCGGACA 
bSl CTGGACCTGG 
7D1 AAATGTTGTA 
751 CCAGGTGCAC 
45 BOl ACAGATGTGG 
flSl GCTTACGGGG 
=101 GCTTCCATAA 
=151 ACCTGTGGAG 
1DD1 AGGAGGCAGG 
50 1051 CTCCAAAGAG 
1101 GACTCGCCTG 
1151 GAGATAAGCT 
1E01 GCCCTGGTGA 
1ES1 TCCAGATAGG 
55 1301 CCAGGATCAC 
1351 TTTGATGCGA 
1401 ACACCGAAGG 
1451 GGAAACGCCA 



GCTGGGAGCC 
CGCATAGCCC 
CCCTGGGAGA 
TTGATCCAGA 
CCGCAGAGAT 
GGGGGAACAC 
AGGACTGTGG 
CACCTGGGCA 
CGCCCAGGCG 
TCCCAAGGGA 
CCCCGTAACT 
GGAGGAGAGG 
CCAATTGTTC 
GCCCAGACTC 
CCGAGAACTA 
TGGCCTTTGA 
CAGGTGCCCG 
CCCTGCTCTC 
CTGTGGAGGA 
AGCCATAAAA 
AGAGAAAGTA 
CTGTAGAAAA 
AAACGAGTCT 
TAAGCTGATG 
AGCTCCTGCG 
GAGAGTCTGG 
TGGGGTTGGG 
GGCCTTCCCA 
GGTGGTGTGG 
CACATTCTGC 



AGGGGTGCCC 
CCGGAGAGCC 
AGGCTGTGGA 
TATGTGCCTC 
TCGAGGACAT 
CTGAACACCC 
CGAGGATGAG 
GATACAGGGT 
ATTCTACTGG 
AATACCAGGA 
CAGATGGGGA 
CGGACCGTGT 
GGCTCCAAGA 
TGGGGGCAGC 
AGAGTGTTTT 
TGCCTGGCTT 
AGGGGGAAAA 
CAGGTGGTCA 
GTGCCTGGCT 
TTGCCCAGGT 
TCTAGCTTTG 
CAATGTGGTA 
TAAGTGGGGC 
AAACAGCGAA 
TGAGGAGGAG 
AGGGGCTGGA 
GCAGTACCTC 
GGGCTACCGG 
CAAGGGCTGG 
TATAGCTGTG 



GACCCCCGTC 
CTCTGGGGAC 
GACCTGGGCC 
ACGCCCTGAT 
GCCGTTGACC 
GGAGGTGCAT 
TTTGAGGAGA 
GATTGGCAGG 
AGCTGGCACA 
AAGGGGGGGC 
ATTTCTCAAC 
CAGATATGAA 
GTGACTATAT 
AGTGCAGCCT 
CTGGGAACAC 
GAGCACACCA 
GAGGCGGAGG 
GTGGGCTCCG 
GCCTTGCAGC 
GAAGTTGTGT 
TGTTACGTTT 
TCACGTAGAA 
CACCCTTCCT 
G6AAGCCTCC 
GAATGGGAGG 
AGTAGCCCCA 
TCCCTGCCTC 
CGCCGGAGGG 
CTCTCGAGGC 
GGGAAGACGG 



CGCCGCCGCC 
CCCGACCAGA 
TTCTGCGATC 
CACTCCCCCC 
TTGTTACAGG 
GCTCATCCTG 
CACTCCAGGA 
ATGTTTAGGA 
AGATATCGAC 
CCTGGGAAGT 
AGACTGAACC 
CCGAGTCCTC 
CACCAGAGTT 
CTGCTAGAAC 
CATATCCATC 
CTGAGATGCT 
CTGATGGAAT 
GGCCAGCAAT 
AGGTGTTCGG 
AAAGCCTATC 
GGAACCCCTG 
ACGTGAATCA 
GACAAACTCC 
TGGTTTCCT6 
CCACTTTAGG 
AGGCCACCTG 
TGGCAACAGT 
GCAGAGGCCA 
TCAAGAAAAC 
CCACATCAGG 
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1501 GTACAGTGCA TCAACCCCTC CAACCTGCTC TTGGCCAAGG AGACAAAAGA 
1SS1 GATATTGGAA GGAGGGGAAA GAGAAGCCCA GACAAACAGC AGATGAGTTG 
IbOl AGTGGGGCAG AGGGACAGGG CAGCCAGACC AAGGCCAAGC CTTCTCACCC 
IbSl TTGGCCAGCT GGAAGGGACT TCAGCAACCA AGACCACCTG GCAACAGGCT 
5 17D1 CAGTGGGGGT CAGGTCCAGG TCCCCGAAGA GGTGCTGGAG AGGAAAGCAG 
1751 GGAGCCACTG CATCCAGCAC ATGGGGTGCC TGGGCCTCAG ATGGGGACCC 
IflDl CAAAGAAGCA GAAGCTGAAG AAGGTACGGC TGGGGGTTCT GTCCTGCTCA 
IflSl TCCAACCACC CCTAAATACC CACCCTGTGG ACTTTGAGCT GAACATGCCC 
1=101 ACTGGCCCCC AGGCCACATG GGACCTGGAG GAGCCTACCT GGGGCCTGCC 

10 1=151 CCTGCCAGCA GGTGCCAGGG CTGGTGAGGA AGAGCTGGGG GGCAGAGGTA 
S001 AAGCCCTGCA GGGGAGGCCA CAGGGTCCAT CCCGTCTTCA GGATCATCTA 
£051 CACTGCACTA GGGGAGCCCC AGGAAGGCAG CACCCTGGAG GCCCTGTGCC 
21D1 AGTGAGGACA GGAGACCCTA AGGCCCCGGG AGCCCAGTGC CAGCCAGAGG 
51S1 TTGTGCAGGC AAGGAGACCA AAGATTGATG AGAAGACCCC CAGCAGGGGT 

15 S2D1 ACTGGGTACC CGGCAGGCCA GTGCCCTCAC AGTTGACTTG GACCAGGGTG 
2SS1 GCTGTGAAGG GAAGTCTTTG TTGCAAAGGA GGAGGAGGAA AAGGGAGGAC 
23D1 TTGGTAGGGT TTTGTTTCTT CTGCTTGTTT CTGTACAGGG CCACCAGACT 
5351 CCTGGAGAGA TCAAGCAA6G AGAACCTGGG GCTGCCATG6 CCAAAGCAAC 
2M01 TCAACAGATG CCAATGCCAA TTCCAAGGCC AGCCACAACC CTGCCACCTT 

20 2HS1 GGGGAATCCA GCCTGGAGGC ATCCCCTAAG CAGCCAGCCA TGGCCTGGGT 
2501 GGAGGCACCT GAAGACGTCT GTCCCAAACT CCCCCAGCCC TGAGCTGGGA 
2551 GATGACAGGG GGAAAGAGGC CCTCTCAAGG GTGCCAGATG CCTGGGTCTC 
2b01 CCAAGAGGGG TCCCCCAACT CACCGTTCCC GGGACAGGCT GCCCCCTGTT 
2b51 CCAGGAAGCT CATCCTCACC TGTGTAGGCC CCTGTAGTGA CCCACGCGTC 

25 2701 CAGCAGACGC CCACCCACCG CTAGCCGTTG TTCCTGTGCA AAGTAGTGTG 
2751 CTATGCACCC ACCCAGGTGG CCGCCTCTGG GCCCAAGGCA CATGCTGTGA 
2flDl GCTTCCTGTG AGCCCAGGCT CTGCTCACTG CTGTCCCGCG TCATGAGCAC 
2A51 CACCTCTGCT TTCCCTGTGT AGATCTAGGC CAGTGGCTGC TTGTTCTTGT 
2=101 GGAGCTGTGT GTGTTCTTCT CTGAGCAGCT CCTCCCCGGA GTCCCCCAGC 

30 2=i51 ACAGTCCCAG GAGATGACAG GAAGGAAGCA CCAGGGCAAG GCGGACGCTC 
3D01 ACCCTGTGAC CACGATGGTG ACCGTGGCTG TGGGAGGAAG AACTGGACCC 
3051 AGGACGGAGC GGGGCTGCCC TGCCTGAGGC TCCCGAGGAG CTTTGTGCTT 
3101 TGGTGTTCCA CCCCTGTTGT TACTCATGAC TCAGTTTCCT TGACCTGGTA 
3151 GGGTGTTCCC TGCTGTGTTT TCCAGTGTCC TGTGACTGTC CTGTGCGGGC 

35 3201 CATAGGGCAG GGCCCTGCCC CAGCAGATGG GCTTGGGAGG GGGCTCCCTA 
3251 AAGCCAGTGG ACACTGCCAG AGTCTACCTT CCTGGCAAGA GGCAGACCCC 
3301 GGGGCCCTCA GGAAGGAGGG AGTTGGCAGC GGGGGCTGCA GCAGGAGTAG 
3351 GAGCAGATGA GGCGTCTTGC CAGGAACCTC AGGAGGAGGG GGCCCGGGAC 
3M01 CTGTGTGGGA CCTGTGTCCT GTGGTGGCCG TTTGCAGTTT CTCTCTGTGT 

40 3MS1 TGTGATTCCC TTCTCTTCAA TGGTTTCAGT ACGTGTTTCT CTTCAATAAA 
3501 CTTCATTCAG TGTTAAAAAA AAAAAAAAAA AAAA 



BLAST Results 

45 -- 

No BLAST result 



50 Medline entries 



=H15fll7=J: 

Mali a novel neuron- and testis-specif ic proteini is recognized 
55 by 

the serum of patients with paraneoplastic neurological disorders. 
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PCT/IB01/02050 



Peptide information for frame 1 

5 

ORF from B21 bp to 15=13 bp} peptide length: M55 

Category: strong similarity to known protein 
Classification: unclassified 

10 1 MPLTLLflDUC RGEHLNTRRC MLILGIPEDC GEDEFEETLfl EACRHLGRYR 

SI VIGRMFRREE NAflAILLELA (2DIDYALLPR EIPGKGGPWE VIVKPRNSDG 

• 101 EFLNRLNRFL' EEERRTVSDM NRVLGSDTNC SAPRVTISPE FlilTUAflTLGA 

. 151 AV(2PLLE<2I1L YRELRVFSGN TISIPGALAF DAULEHTTEN LflllUflVPEGE 

201 KRRRLMECLR GPALfiVVSGL RASNASITVE ECLAALc?(2VF GPVESHKIA<3 

15 251 VKLCKAYflEA GEKVSSFVLR LEPLLtJRAVE NNVVSRRNVN (2TRLKRVLSG 

301 ATLPDKLRDK LKLMKflRRKP PGFLALVKLL REEEEUEATL GPDRESLEGL 

351 EVAPRPPARI TGVGAVPLPA SGNSFDARPS (2GYRRRRGRG fiHRRGGVARA 

M01 GSRGSRKRKR HTFCYSCGED GHIRVflCINP SNLLLAKETK EILEGGEREA 
451 (3TNSR 

20 



BLASTP hits 

25 No BLASTP hits available 

Alert BLASTP hits for DKFZphtes3_5k22-. frame 1 

TREI1BLNEIil:AB02Db ,: iO_l gene: "KIAA0afl3"i product: "KIAA0flfl3 
30 protein"; 

Homo sapiens mRNA for KIAA0flfl3 protein-i complete cds-i N = li 
Score = 

722-. P = 2.1e-71 

35 TREPIBL: AF0373b1_l gene: "MAI"; product: "paraneoplastic neuronal 
antigen NA1"; Homo sapiens paraneoplastic neuronal antigen HA1 
(HAD 

mRNA-i complete cds.-i N = l-i Score - bb5i P = 2.be-b5 



40 



45 



50 



>TREMBLNEIi): ABD20b c J0_l gene: "KIAA0flfi3"i product: "KIAA0flfl3 
protein"; Homo 

sapiens mRNA for KIAADAB3 protein-i complete cds- 
Length = 3bi» 

HSPs: 

Score = 722 (lOfl.3 bits)-. Expect = 2.Me-71-. P = 2-He-?l 
Identities = 15b/3Hfi (MHJi) i Positives = 21S/3H8 <bl*> 



fluery: 1 

MPLTLLflDUCRGEHLNTRRCIILILGIPEDCGEDEFEETLflEACRHLGRYR VIGRMFRREE bD 

n L LL+DUCR ++ ++ +++ GIP D EE +E LflE + 
LGRYR++G++FR++E 
55 Sbjct: 1 

nALALLEDUCRINSVDEflKSLMVTGIPADFEEAEIflEVLflETLKSLGRYRLLGKIFRKflE bO 
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<3uery: bl 

NAfiAILLELA^DYALLPREIPGKGGPWEVIVKPRNSDGXXXXXXXXXXXXXXXTVSDil ISO 
NA A+LLEL +D D + +P E+ GKGG U+VI K N D 

TVS H 
5 Sbjct: bl 

NANAVLLELLEDTDVSAIPSEVflGKGGVUKVIFKTPNtmEFLERLNLFLEKEGinVSGII 120 

fluery: 121 NRVLGSDTNCSAPRVTISPEFlilTlil— 
AfiTLGAAVflPLLEflflLYRELRVFSGNTISIPGAL 17fl 
10 R LG + A ISPE <2 + A (3PLL II YR+LRVFSG+ 

+ P 

Sbjct: 1B1 FRALGflEGVSPATVPCISPELLAHLLGfiAflAHAPdJPLLP- 
FIRYRKLRVFSGSAVPAPEEE 171 

15 fiuery: 171 

AFDAWLEHTTEMLflMUflVPEGEKRRRLMECLRGPALflVVSGLRASNASITVEECLAALflC! 23B 
+F+ ULE TE+++ III V E EK+R L E LRGPAL ++ ++A N 

SI+VEECL A +<2 
Sbjct: 1AQ 

20 SFEVWLEflATEIVKEIjPVTEAEKKRIJLAESLRGPALDLIIHIVfiADNPSISVEECLEAFKfl 231 
(Juery: 231 

VFGPVESHKIAflVKLCKAYdEAGEKVSSFVLRLEPLLflXXXXXXXXXXXXXXXXXLKRVL 21fi 
VFG +ES + A(JV+ K Y<3E GEKVS++VLRLE LL + 

25 L++V+ 

Sbjct: 2iiD 

VFGSLESRRTAflVRYLKTYUEEGEKVSAYVLRLETLLRRAVEKRAIPRRIADflVRLEflVII 211 

fiuery: 211 SGATLPD<LRDKLKLHK<3RRKPPGFLALVKLLREEEEUEATLGPDRESLE 
30 3H& 

+GATL L +L+ +K + PP FL L+K++REEEE EA+ + ES+E 
Sbjct: 3DD AGATLNflMLIilCRLRELKDflGPPPSFLELIIKVIREEEEEEASF — ENESIE 
3iJ7 

35 

Pedant information for DKFZphtes3_5k22-i frame 1 



Report for DKFZphtes3_Sk22.1 

40 

CLENGTH3 MSS 
CMIO S1S1M.3M 
EpII 1-27 

45 EH0H0L3 TREflBLNEU: AB02Dt10_l gene: "KIAA0flfi3"i product: 

n KIAAOflS3 protein"i Homo sapiens mRNA for KIAA0flfl3 protein-i 
complete cds- 3e-75 

EBL0CKS3 BL0Dfl7bB Indoleamine 2-.3-dioxygenase proteins 
EPFAIU Zinc finger-. CCHC class 

50 CKIill Alpha_Beta 

CKliO L0U_C0NPLEXITY 13. 41 V. 



SEfl MPLTLLflDUCRGEHLNTRRCIILILGIPEDCGEDEFEETLflEACRHLGRYRVIGRflFRREE 

55 SEG 

PRD ccchhhhhccccccccccceeeeeecccccchhhhhhhhhhhhhhccceeehhhhhhhhh 

SE(2 NA<2AILLELA(3DIDYALLPREIPGKGGPUEVIVKPRNSDGEFLNRLNRFLEEERRTVSDf1 
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WO 01/98454 PCT/EB01/02050 . 

SEG xxxxxxxxxxxxxxx 

PRD hhhhhhhhhhhcccccccccccccccceeeeeeecccccchhhhhhhhhhhhhhchhhhh 

SE<2 NRVLGSDTNCSAPRVTISPEFUTUAflTLGAAVflPLLECnLYRELRVFSGNTISIPGALAF 

5 SEG 

PRD hhhhcccccccccccccccchhhhhhhhhhhhhhhhhhhhhhhheeeccccccccchhhh 

SEfl DAULEHTTEI1L(3l1U(!VPEGEICRRRLriECLRGPAL(3\/VSGLRASNASITVEECLAAL(2(3VF 

SEG 

10 PRD hhhhhhhhhhhhhhhccchhhhhhhhhhhhccccccccccccccceeehhhhhhhhhhhh 

SE(2 GPVESHKIA(3VKLCKAY(3EAGEKVSSFVLRLEPLL(3RAVENNVVSRRNVN(3TRLKRVLSG 

SEG xxxxxxxxxxxxxxxxx 

PRD hccchhhhhhhhhhhhhhhcccccceeeeehhhhhhhhhhhhcchhhhhhhhhhhhhhhc 

15 

SE(2 ATLPDKLRDKLKLMKflRRKPPGFLALVKLLREEEEIilEATLGPDRESLEGLEVAPRPPARI 

SEG 

PRD ccccchhhhhhhhhhhhccccchhhhhhhhhhhhhhhhhcccchhhhheeeecccccccc 

20 SEfl TGVGAVPLPASGNSFDARPS<3GYRRRRGRG(3HRRGGVARAGSRGSRKRKRHTFCYSCGED 

SEG xxxxxxxxxxxxxxxxxxxxxxxxxxxxx 

PRD eeeeQeccccccccccccccccccccccccccceeeeeeccccccccccceeeeeccccc 

SE(2 GHIRVcJCINPSNLLLAKETKEILEGGEREAflTNSR 

25 SEG 

PRD ceeeeeeccccchhhhhhhhhhhcccccccccccc 



30 



(No Prosite data available for DKFZphtes3_Sk22.1) 

Pfam for DKFZphtes3_5k22.1 



35 H!W_NAHE Zinc finger, CCHC class 



40 



Hnn *<2kCl)NCGKPGHIiriRDCPE* 

C++CG+ GH+ +C + 
fluery 112 TFCYSCGEDGHIRVflCIN MET 
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DKFZphtes3_7nl2 



PCT/1B01/02050 



5 group: transmembrane protein 

DKFZphtes3_?nl2 encodes a novel 703 amino acid protein without 
similarity to known proteins. 

10 The novel protein contains 1 transmembrane domain 

No informative BLAST resultsi No predictive prositei pfam or SCOP 
motif e . 

The new protein can find application in studying the expression 
15 profile of testis-specif ic genes and as a new marker for 
testicular cells. 



20 



25 



30 



putative protein 

contains transmembrane domain 
perhaps complete cds> 

Sequenced by BMFZ 

Locus: unknown 

Insert length: E3M7 bp 

Poly A stretch at pos- 2271i polyadenylation signal at pos. 2253 



1 C6GCTGCAGT CTGGGCCGGG GCCCTGTGCC GCTGAAGACA TGGAGTTTGT 

51 GTCTGGATAC CGGGATGAGT TCCTTGATTT CACTGCCCTT CTCTTCGGCT 

101 GGTTCCGAAA GTTTGTGGCA GAGCGTGGAG CTGTAGGGAC TAGCCTTGAG 

35 151 GGCCGCTGCC GGCAGCTGGA GGCCCAGATC AGAAGGCTAC CCCAGGACCC 

201 TGCCCTTTGG GTGCTCCATG TCCTGCCCAA CCATAGTGTG GGCATCAGCC 

251 TGGGGCAAGG GGCAGAACCA GGTCCTGGAC CAGGCCTGGG GACTGCCTGG 

301 CTCCTGGGAG ACAACCCTCC ACTCCACCTG CGAGACCTGA GCCCCTACAT 

351 CAGCTTTGTC AGCCTAGAGG ATGGGGAGGA AGGGGAGGAG GAAGAGGAGG 

40 M01 AAGATGAAGA AGAAGAGAAG AGAGAGGACG GGGGTGCAGG CAGCACAGAG 

MSI AAGGTGGAAC CAGAGGAGGA CCGGGAGCTA GCCCCTACCA GCAGGGAGTC 

501 CCCCCAGGAA ACAAACCCTC CAGGAGAGTC AGAGGAGGCT GCCCGGGAGG 

551 CAGGAGGTGG CAAGGATGGC TGCCGAGAGG ACAGGGTGGA GAACGAAACA 

tOl AGACCCCAGA AGAGGAAGGG ACAGAGGAGT GAGGCTGCCC CCCTGCACGT 

45 b51 TTCCTGTCTC TTACTTGTGA CGGATGAGCA TGGCACCATC TTGGGCATTG 

701 ATCTGCTAGT GGATGGAGCC CAGGGAACCG CAAGCTGGGG CTCAGGGACC 

751 AAGGACCTGG CTCCTTGGGC CTATGCTCTC CTCTGTCACA GCATGGCCTG 

flOl TCCCATGGGC TCTGGGGATC CCCGAAAGCC CCGACA6CTT ACTGTGGGAG 

A51 ATGCCCGGCT GCATCGAGAG CTGGAGAGCT TGGTCCCAAG GCTAGGTGTG 

50 101 AAGTTAGCCA AAACCCCAAT GCGGACATGG GGTCCCCGGC CAGGCTTCAC 

151 CTTTGCTTCC CTTCGTGCTC GAACCTGCCA TGTGTGTCAC AGGCACAGCT 

1001 TTGAAGCGAA GCTGACACCT TGCCCCCAGT GTAGTGCTGT CTTGTATTGT 

1051 GGAGAGGCTT GTCTCCGGGC TGACTGGCAG CGGTGCCCAG ATGATGTGAG 

1101 TCACCGATTT TGGTGCCCAA GGCTTGCAGC CTTCATGGAG CGGGCAGGAG 

55 1151 AACTGGCAAC CCTACCTTTT ACCTACACCG CAGA6GTGAC CAGTGAAACC 

1201 TTCAACAAAG AGGCCTTCCT GGCCTCTCGG GGCCTCACTC GTGGCTATTG 

1251 GACCCAGCTC AGCATGCTGA TTCCAGGCCC GGGCTTCTCC AGACACCCCC 

1301 GAGGCAACAC GCCATCCCTC AGCCTTCTTC GCGGTGGAGA CCCCTACCAG 
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PCT/IB01/02050 



1351 CTTCTCCAGG GAGACGGGAC 
mOl ACCCCGGGGT GTTTTTGTCC 
1451 TGAAGATCCA CGTGGTGGAG 
1501 TTTTGGGAGC TTTTGGTCCT 
5 1551 TGTAGGTGAT GGCCTGCCCC 
IbOl AGAGGGACAG CCTGGAGGTG 
lb51 CGGCCCAGCT CTGGCACTAA 
17D1 CAAGGTGTCA GCAAGGCCCT 
1751 TGGTTATTGG ATTTAACTCC 

10 IfiOl TCTCTGCCCC GGTTACAGTC 
1651 CAGCGAGTAC AGCTGTGTGA 
1*101 GAGGGGGCAC CAGCCCTCCC 
1151 CTCAGAGCGG CCGACAACTG 
BQ01 CCACCTGGTT TACAAGCCTG 

15 E051 GGCCCCCACC CCCATCCCCA 
E101 AGGCGCCGAG GAGAAAAGAA 
E1S1 AATGCTGATA CCCTAGTAGT 
EE01 TGAAAACACT CAAGGCCTAG 
ES51 GTAAATAAAA TTACTTGTTT 

20 S301 AAAAAAAAAA AAAAAAAAAA 



TGCCCTGATG CCTCCTGTGC CCCCACATCC 
CTGAGCTCAA CATCCAAAAC AAACAGTCAC 
GCCGGGAAGG AGTTTGACCT TGTCATGGTG 
GCTCCCCCAT GTGGCCCTGG AGCTGCAGTT 
CCGAAAGCGA CGAGCAGCAT TTTACCCTGC 
TCTGTCCGGC CTGGTTCCGG CATATCAGCA 
GGAGAAAGGG GGCCGCAGGG ACCTGCAGAT 
ACCACCTGTT CCAGGGGCCC AAGCCTGACC 
GGGTTTGCTC TCAAGGATAC GTGGCTGAGG 
CCTCCGAGTG CCAGCCTTCT TCACCGAGAG 
TGGACGGCCA GACCATGGCG GTGGCCACTG 
CAGCCCAACC CCTTCCGCTC CCCCTTTCGC 
CATGTCCTGG TACTGCAATG CCTTCATCTT 
CTCAAGGGAG CGGGGCCCGC CCGGCGCCCG 
ACTCCCTCTG CTCCTCCTGC CCCCACCCGA 
ACCTGGGCGG GGGGCCCGCC GGCGGAAATG 
CCCCA6CTCC CAAACACTGA AAGGAAAACG 
GGGGAGGACA GGTTGGTAAA ACATGAAAAG 
GAAAAAAAAA AAAAAAAAAA AAAAAAAAAA 
AAAAAAAAAA AAAAAAAAAA AAAAAAA 



BLAST Results 

25 

No BLAST result 

Medline entries 

30 

No Medline entry 

35 

Peptide information for frame 1 



ORF from 4D bp to ElMfl bpi peptide length: 703 
40 Category: putative protein 

Classification: Transmembrane proteins unclassified 

1 MEFVSGYRDE FLDFTALLFG UFRKFVAERG AVGTSLEGRC RflLEAfllRRL 

51 PUDPALUVLH VLPNHSVGIS LGUGAEPGPG PGLGTAULLG DNPPLHLRDL 

45 101 SPYISFVSLE DGEEGEEEEE EDEEEEKRED GGAGSTEKVE PEEDRELAPT 

151 SRESPUETNP PGESEEAARE AGGGKDGCRE DRVENETRPfl KRKGflRSEAA 

EDI PLHVSCLLLV TBEHGTILGI DLLVDGAflGT ASIilGSGTKDL APIilAYALLCH 

E51 SMACPMGSGD PRKPRflLTVG DARLHRELES LVPRLGVKLA KTPMRTUGPR 

301 PGFTFASLRA RTCHVCHRHS FEAKLTPCPtf CSAVLYC6EA CLRADWflRCP 

50 351 DDVSHRFWCP RLAAFflERAG ELATLPFTYT AEVTSETFNK EAFLASRGLT 

401 RGYUTflLSML IPGPGFSRHP RGNTPSLSLL RGGDPYULLfl GDGTALMPPV 

451 PPHPPRGVFV PELNIflNKflS LKIHVVEAGK EFDLVMVFUE LLVLLPHVAL 

501 ELflFVGDGLP PESDEfiHFTL flRDSLEVSVR PGSGISARPS SGTKEKGGRR 

551 DLfllKVSARP YHLFfiGPKPD LVIGFNSGFA LKDTULRSLP RLtfSLRVPAF 

55 bOl FTESSEYSCV MDGUTMAVAT GGGTSPPflPN PFRSPFRLRA ADNCMSUYCN 

b51 AFIFHLVYKP AflGSGARPAP GPPPPSPTPS APPAPTRRRR GEKKP6RGAR 
701 RRK 
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BLASTP hits 

5 No BLASTP hits available 

Alert BLASTP hits for DKFZphtes3_?nl2-. frame 1 
No Alert BLASTP hits found 

10 

Pedant information for DKFZphtes3_?nl2n frame 1 
Report for DKFZphtes3_7nl2 . 1 

15 

ELENGTH3 703 
EMIO 77315-72 
IpU b-M5 
20 IEKUJ TRANSMEMBRANE 1 

CKliD L0ld_C0MPLEXITY 15.22 '/. 

SE<3 MEFVSGYRDEFLDFTALLFGUFRKFVAERGAVGTSLEGRCRCLEAfllRRLPfiDPALUVLH 
25 SEG 

PRD ccceeeccchhhhhhhhhhhhhhhhhhhhccccccchhhhhhhhhhhhhccccccccccc 
MEM 

SE<3 VLPNHSVGISLGflGAEPGPGPGLGTAULLGDNPPLHLRDLSPYISFVSLEDGEEGEEEEE 

30 SEG xxxxxxxxxxx 

PRD cccccccccccccccccccccceeeeeecccccccccccccceeeeeeccccchhhhhhh 

MEM 

SEC EDEEEEKREDGGAGSTEKVEPEEDRELAPTSRESPlJETNPPGESEEAAREAGGGKDGCRE 

35 SEG xxxxxxxxxxxx xxxxxxxxxxxxx 

PRD hhhhhhhhcccccccccccccccccccccccccccccccccchhhhhhhhccccccccce 

MEM 

SE(2 DRVENETRP(2KRKG(3RSEAAPLHVSCLLLVTDEHGTILGIDLLVDGA(JGTASUGSGTKDL 
40 SEG 

PRD eeccccccccccccccccccchhhhhheeeecccccccchhhhhhccccccccccccccc 
MEM 

SEA APUAYALLCHSMACPMGSGDPRKPRGJLTVGDARLHRELESLVPRLGVKLAKTPMRTUGPR 
45 SEG 

PRD hhhhhhhhhhhhccccccccccccceeeecchhhhhhhhhhhcccccccccccccccccc 
MEM 

SEfl PGFTFASLRARTCHVCHRHSFEAKLTPCPfiCSAVLYCGEACLRADltHJRCPDDVSHRFUCP 
50 SEG 

PRD ccccchhhhhhhhcccccccccccccccccceeeeccchhhhhhhhccccccccccccch 
MEM 

SEfl RLAAFMERAGELATLPFTYTAEVTSETFNKEAFLASRGLTRGYUTflLSMLIPGPGFSRHP 
55 SEG 

PRD hhhhhhhhhhhhhccccccccchhhhhhhhhhhhhhhhcccccchhhhhccccccccccc 
MEM 
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10 



15 



20 



25 



30 



WO 01/98454 PCT/IB01/02050 

SEd RGNTPSLSLLRGGDPY(3LL(3GI>GTALI1PPVPPHPPRGVFVPELNI(3NK(3SLKIHVVEAGK 

SEG xxxxxxxxxxxxxx 

PRD cccccceeeeeccccceeeccccccccccccccccceeeeccccchhhhhheeeeeeccc 

MEM 

SEC EFDLVMVFhlELLVLLPHVALELflFVGDGLPPESDEflHFTLflRDSLEVSVRPGSGISARPS 

SEG xxxxxxxxxxxxx 

PRD cccchhhhhhhhhchhhhhhhhhhhcccccccchhhhhhhccccceeeeccccccccccc 

MEM • • • MMMMMMMMMMMMMMMMM 

SE(2 SGTKEKGGRRDLfllKVSARPYHLFtfGPKPDLVIGFNSGFALKDTWLRSLPRLflSLRVPAF 

SEG 

PRD ccccccccccceeeeeeccccccccccccceeeecccccccccccccccccccccccccc 

MEM 



SE(2 FTESSEYSCVMDG<3TMAVATGGGTSPP<2PNPFRSPFRLRAADNCMShlYCNAFIFHLVYKP 

SEG x 

PRD ccccccGeGeccccGeeeeecccccccccccccccchhhhhcchhhhhhhhhhhhhhccc 

MEM 

SEA AflGSGARPAPGPPPPSPTPSAPPAPTRRRRGEKKPGRGARRRK 

SEG xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx 

PRD ccccccccccccccccccccccccchhhhhccccccccccccc 

MEM 



(No PrositG data available for DKFZphtes3_7nl2.1) 
(No Pfam data available for DKFZphtes3_7nlE.l) 
DKFZphtes3_Telb 



35 group: transmembrane protein 

DKFZphtesS^elb encodes a novel S3T amino acid protGin without 
similarity to known proteins. 

40 ThG novel protein contains 1 transmembrane region. The only EST 
described so far is from testis. 

No informative BLAST results; No predictive prosite-i pfam or SCOP 
motif e . 



45 The new protein can find application in studying the expression 
profile of testis-specif ic genes and as a new marker for 
testicular cells* 

50 putative protein 
1 EST hit 

perhaps complete cds- 
55 Sequenced by DKFZ 
Locus: unknown 
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WO 01/98454 PCT/1B01/02050 
Insert length: 5011 bp 

Poly A stretch at pos- lTflbi no polyadenylation signal found 



5 1 CATGGCAACA TGAGCAGTGC TGAGATAATT GGTTCTACAA ATCTTATAAT 

51 TCTGCTAGAG GATGAAGTCT TTGCCGATTT TTTCAACACA TTTCTTTCCC 
101 TCCCGGTTTT TGGTCAGACA CCATTTTATA CTGTTGAAAA TTCACAGTGG 
1S1 AGCTTGTGGC CAGAAATACC TTGTAACTTG ATTGCCAAAT ACAAAGGGTT 
301 ATTGACCTGG TTGGAAAAAT GCCGATTACC TTTCTTCTGT AAAACAAACT 
10 551 TGTGTTTCCA TTACATTCTC TGTCAGGAGT TCATCAGTTT CATTAAGTCC 
301 CCAGAAGGAG CCAAGATGAT GAGATGGAAA AAGGCAGACC AGTGGCTACT 
351 CCAGAAATGC ATTGGCGGGG TCAGAGGGAT GTGGCGCTTC TATTCCTACC 
4D1 TCACAGGCAG TGCAGGTGAA GAATTGGTGG ATTTCTGGAT CCTTGCTGAG 
151 AACATCCTGA GCATAGATGA GATGGACCTG GAAGTGAGAG ACTACTACCT 
15 501 GTCCCTCCTC CTCATGCTGA GGGCCACTCA TCTGCAGGAG GGCTCCAGGG 
551 TGGTAACCCT CTGTAACATG AACATCAAGT CCCTCCTGAA CCTCTCCATC 
bOl TGGCATCCCA ACCAATCAAC CACTAGGAGG GAGATCCTGA GCCACATGCA 
b51 GAAAGTGGCT CTGTTCAAAC TCCAGAGCTA TTGGCTTCCC AACTTTTACA 
701 CCCACACCAA GATGACCATG GCCAAGGAGG AAGCATGCCA TGGTCTGATG 
20 751 CAAGAGTACG AGACTCGCTT ATACAGCGTT TGCTACACCC ACATAGGAGG 
801 GCTCCCTCTG AACATGAGCA TCAAGAAGTG CCACCACTTT CAGAAACGGT 
SSI ACTCAAGCAG GAAAGCCAAG AGGAAGATGT GGCAATTGGT AGATCCTGAC 
"JOl TCTTGGTCTC TGGAAATGGA TCTCAAGCCA GATGCTATTG GTATGCCCCT 
151 ACAGGAGACA TGTCCTCAAG AGAAGGTGGT TATACAAATG CCTTCCCTGA 
25 1001 AAAT6GCTTC TTCAAAGGAA ACAAGAATCA GTTCCCTGGA AAAGGATATG 
1051 CATTATGCAA AAATATCCAG CATGGAGAAT AAAGCCAAGA GCCACCTCCA 
1101 CATGGAAGCC CCCTTTGAGA CAAAGGTCTC TACCCACCTG AGGACTGTCA 
1151 TCCCCATTGT CAATCACTCC TCCAAGATGA CAATTCAGAA GGCCATCAAG 
1B01 CAAAGCTTCT CCTTAGGATA CATCCACTTG GCCTTGTGTG CTGATGCCTG 
30 1251 TGCAGGGAAC CCTTTCCGGG ACCACCTGAA GAAGCTGAAT TTGAAAGTGG 
1301 AGATCCAACT TCTTGACCTC TGGCAGGACT TGCAGCATTT CCTCAGTGTC 
1351 CTTCTGAATA ACAAAAAGAA TGGGAATGCA ATCTTTCGTC ACTTGCTGGG 
1401 TGACAGAATC TGCGAGCTCT ACCTGAATGA GCAGATTGGT CCGTGCTTAC 
1451 CACTCAAATC CCAAACCATT CAGGGCCTGA AGGAACTATT GCCCTCTGGG 
35 1501 GATGTGATCC CCTGGATTCC CAAAGCCCAG AAGGAGATTT GCAAGATGCT 
1551 CAGTCCCTGG TATGATGAGT TTCTAGATGA AGAGGACTAC TGGTTTCTCC 
IbOl TTTTTACGGT AGGAAGGACT TTGGGTTAGG AAGGAATCAT GAGGATGAGG ' 
lb51 6AAGAAGAAA GAGTAATTAC TGTTTTAAAA GGGTTATGTG TTAAAGTAAA 
1701 TGAAATTGTT ATTTTTCCTA GAGTCAACCA AAGATCAGCA TGGTCCCTGT 
40 1751 TGTTCTAAAG CTAAACCTCT CAAGGAAAAG GACTCAGTGC ATAAGATGAC 
1301 TTTGGTGAAA CCCCGTCTCT ACTAAAAATA CAAAAAATTA GCCGGGCGTA 
1551 GTGGCGGGCG CCTGTAGTCC CAGCTACTTG GGAGGCTGAG GCAGGAGAAT 
1101 GGTGTGAACC CGGGAGGCGG AGCTTGCAGT GAGCC6AGAT CCCGCCACTG 
1151 CACGCCAGCC TGGGCGACAG AGCGAGACTC CGTCTCAAAA AAAAAAAAAA 
45 2001 AAAAAAAAAA G 



BLAST Results 



50 

No BLAST result 



Medline entries 

55 

No Hedline entry 
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Peptide information for frame 1 



5 

ORF from ID bp to IbEb bpi peptide length: S3T 
Category: putative protein 
Classification: no clue 

10 1 MSSAEIIGST NLIILLEDEV FADFFNTFLS LPVFGflTPFY TVENSfllilSLU 

51 PEIPCNLIAK YKGLLTULEK CRLPFFCKTN LCFHYILCflE FISFIKSPEG 

101 AKMItRUKKAD flWLLflKCIGG VRGPIURFYSY LTGSAGEELV DFUILAENIL 

151 SIDEMDLEVR DYYLSLLLML RATHLflEGSR VVTLCNHNIK SLLNLSIUHP 

SQ1 NflSTTRREIL SHMflKVALFK LflSYULPNFY THTKMTMAKE EACHGLflflEY 

15 251 ETRLYSVCYT HIGGLPLNMS IKKCHHFflKR YSSRKAKRKfl lilflLVDPDSWS 

301 LEMDLKPDAI GMPLflETCPfl EKVVIflMPSL KHASSKETRI SSLEKDHHYA 

351 KISSPIENKAK SHLHMEAPFE TKVSTHLRTV IPIVNHSSKI1 TIflKAIKflSF 

401 SLGYIHLALC ADACAGNPFR DHLKKLNLKV ElflLLDLWflD LflHFLSVLLN 

451 NKKNGNAIFR HLLGDRICEL YLNEfllGPCL PLKSflTIflGL KELLPSGDVI 

20 501 PWIPKAflKEI CKMLSPWYDE FLDEEDYWFL LF.TVGRTLG 



BLASTP hits 

25 

No BLASTP hits available 

Alert BLASTP hits for DKFZphtes3_1elb-« frame 1 
30 No Alert BLASTP hits found 

Pedant information for DKFZphtesS^elb-i frame 1 

'35 Report for DKFZphtes3_ £ tell..l 

CLENGTH3 545 
' C Hid 3 LEIDb - Ob 

40 EpIJ fi- 35 

IKIO Alpha_Beta 

SEfl HGNflSSAEIIGSTNLIILLEDEVFADFFNTFLSLPVFGflTPFYTVENSflUSLUPEIPCNL 
45 PRD cccccceeeeccccceeehhhhhhhhhccccccccccccccccccccccccccccccchh 



SEfl IAKYKGLLTULEKCRLPFFCKTNLCFHYILCflEFISFIKSPEGAKMHRUKKADflldLLflKC 

PRD hhhhccceeecccccccccccccceeehhhhhhhhhhhccccchhhhhhhcchhhhhhhh 

50 SEfl IGGVRGrililRFYSYLTGSAGEELVDFIilILAENILSIDEMDLEVRDYYLSLLLtlLRATHLflE 

PRD ccccccceeeeeecccccccchhhhhhhhhhhhcccchhhhhhhhhhhhhhhhhhhhccc 



SEfl GSRVVTLCNMNIKSLLNLSIWHPNflSTTRREILSHMflKVALFKLflSYULPNFYTHTKMTH 

PRD cceeeeecccchhhhhhhhhccccccchhhhhhhhhhhhhhhhhhhccccccchhhhhhh 

SEfl AKEEACHGLIIflEYETRLYSVCYTHIGGLPLNriSIKKCHHFfllCRYSSRKAKRKIIIiJflLVDPD 

PRD hhhhhhhhhhhhhhhhheeeeeeccccccccccccccccchhhhhhhhhhhhhheeeccc 
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SE(2 SUSLEriDLKP])AIGnPLflETCP(3EKVVIi3nPSLKnASSKETRlSSLEK]>MHYAKISSI1EN 

PRD cccccccccccccccccccccccceeeeeccccccccccccccchhhhhccccchhhhhh 

SEfl KAKSHLHUEAPFETKVSTHLRTVIPIVNHSSKMTKaKAIKflSFSLCYIHLALCADACAGN 

5 PRD hhhhhhhcccccccccccccGeeeeeeccccchhhhhhhhhhcccccccchhhhhhcccc 

SE<2 PFRDHLKKLNLKVEI(JLL]>LlilflDL(2HFLSVLLNNKKNGNAIFRHLLGDRICELYLNEi3IC 

PRD ccccchhhhhhhhhhhhhhhhhhhhh^ 

10 SE(3 PCLPLKS(3TK2GLKELLPSGDVIPUIPKAi3KEICKI1LSPUYDEFLDEEDYlJFLLFTVGRT 

PRD ccccccchhhhhhhhccccccceeecccchhhhhhhcccchhhhhccccceeeccccccc 



15 



20 



30 



45 



SEfl LG 
PRD cc 



(No Prosite data available for DKFZphtesB^elb - 1) 
(No Pfarn data available for DKFZphtes3_Telb .1) 



The PROSITE is a database of protein families and domains- 
It consists of biologically significant sites-i patterns and 
profiles that help to reliably identify to which known protein 
25 family (if any) a new sequence belongs. World Wide Web URL 

http://www.expasy.ch/prosite/ is the entry point to the database. 
A description of the prosite consensus patterns follows- 



NAME: N-glycosylation site. 

CONSENSUS: N-{ P}-CST3-{P} - 



NAME: Glycosaminoglycan attachment site- 

35 CONSENSUS: S-G-x-G- 

NAME: Tyrosine sulfation site- 

NAME: cAMP- and cGNP-dependent protein kinase 

40 phosphorylation site- 

CONSENSUS: CRICKS) -x-ESTJ. 



NAME: Protein kinase C phosphorylation site- 
CONSENSUS: ESTJ-x-ERO • 

NAME: Casein kinase II phosphorylation site. 
CONSENSUS: ESTJ-x ( 2 ) -OE3 - 



NAME: Tyrosine kinase phosphorylation site- 
50 CONSENSUS: ERK]l-x(2-i3>-EDE]->c(Ei3>-Y. 

NAME: N-myristoylation site- 

CONSENSUS: G-{EDRKHPFYU>-x (2) -ESTAGCNl-CP} - 

55 NAME: Amidation site- 

CONSENSUS: x-G-ERO-CRO- 

NAME: Aspartic acid and asparagine hydroxylation site. 
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CONSENSUS: C-x-ONJ-x ( H ) -EFY]|-x-C-x-C - 

NAME: Vitamin K-dependent carboxylation domain- 

CONSENSUS: x(lS)-E-x(3)-E-x-C-x(b)-CI)EN]l-x-IELIVI1FY]l-x( c i)- 
5 EFYtO • 

NAME: Phosphopantetheine attachment site* 

CONSENSUS: OEfiGSTALMKRHll-IELIVMFYSTACJ-CGNfll-ELIVMFYAGl- 
EDNEKHSJ-S-ILIVMST])- 
10 CONSENSUS: {PCFY>-ESTAGCP<2LIVMF:i-ELIVMATNl-OEN<2GTAKRHLMJ- 
ELIVMlilSTAll-ICLIVGSTACR]]- 
CONSENSUS: x^-ELIVMFAJ. 

NAME: Acyl carrier protein phosphopantetheine domain 

15 profile- 

NAME : Prokaryotic membrane lipoprotein lipid attachment 

site • 

CONSENSUS: {DERKKtO-CLIVMFUSTAGJ^-ELIVMFYSTAGClO-ILAGSID-C. 

20 

NAME: Prokaryotic N-terminal methylation site. 

CONSENSUS: EKRHEi2STAG3-G-EFYLIVMl-CST]I-ELTl-0:LIVP]l-E- 
ELIVMFUSTAGJ (1H ) • 

25 NAME: Prenyl group binding site (CAAX box). 

CONSENSUS: C-CDEN(3>-ICLIVM3-x>. 

NAME: Protein splicing signature. 

CONSENSUS: CDNEG3-X-CLIVFA3-ILIVMY3-ELVASTJ-H-N-ESTC1 • 



30 



40 



45 



NAME: Endoplasmic reticulum targeting sequence. 

CONSENSUS: CKRHl3SA3-inDEN(2]l-E-L> . 



NAME: Microbodies C-terminal targeting signal 

35 CONSENSUS: CSTAGCN3-CRKH1-CLIVI1AFY]I>. 



NAME: Gram-positive cocci surface proteins 'anchoring' 
hexapeptide • 

CONSENSUS: L-P-x-T-G-ESTGAVDEl . 

NAME: Bipartite nuclear targeting sequence. 

NAME: Cell attachment sequence. 
CONSENSUS: R-G-D • 

NAME: ATP/GTP-binding site motif A (P-loop). 
CONSENSUS: . EAGl-xm-G-K-IEST:]!. 



NAME: Cyclic nucleotide-binding domain signature 1. 

50 CONSENSUS: ELIVM3-EVICJ-X (2) -G-OEN(2TA!D-x-EGAC3-x (2) - 

ELIVMFY3(M)-x(2)-G. 

NAME: Cyclic nucleotide-binding domain signature 2. 

CONSENSUS: ELIVMFH-G-E-x-ICGASJ-ELIVMI-x ( S-.ll) -R-CSTAfll-A-x- 

55 ELIVMA3-X-ESTACV3 • 

NAME: cAMP/cGMP binding motif. 
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NAME: EF-hand calcium-binding domain- 

CONSENSUS: D-x-EDNS]]-CILVFYU}-EDENSTG]]-E]>N<2GHRia-CGP>- 

ELIVMC2-EJ>EN(2STAGC3-x(2>- 

CONSENSUS: EDEJ-ELIVMFYUJ - 

5 

NAME: Actinin-type actin-binding domain signature 1- 
CONSENSUS: EE<21I-x<2)-EATV:I]-EFY:D-x(2)-U-x-N. 

NAME: Actinin-type actin-binding domain signature 2- 

10 CONSENSUS: ELIVMI-x-ESGNl-ELIVMl-EDAGHEl-ESAGID-x-EDNEAG]]- 
ELIVMH-x-EDEAGJ-xCM)- 

CONSENSUS: ELIVIIJ-x-ELMl-ESAGJ-ELIVMJ-ELIVMTl-U-x-ELIVM]!^) - 

NAME: Anaphylatoxin domain signature- 

15 CONSENSUS: ECSH3-C-x(2)-EGAP:D-x(?ia)-EGASTI>E<2R]]-C-EGASTDE<2L]]- 
x(3-. E J)-EGASTDE(2N3-x(2)- 
CONSENSUS: ECEJ-x<b-.7)-C-C- 



20 



NAME: Anaphylatoxin domain profile. 



NAME: Apple domain- 

CONSENSUS : C-x(3)-ELIVMFY3-x(S)-CLIVMFY3l-x(3)-EDEN(33- 
CLIVMFY]|-x(lD)-C-x(3)-C-T- 

CONSENSUS: x ( H ) -C-x-ELIVMFYH-F-x-EFYl-x ( 13 ) -C-x-ELIVMFYJ- 

25 ERO-x-ESTI-xCm-ilS)- 

CONSENSUS: S-G-x-ESTl-ELIVMFY3-x(2)-C 

NAME: Band H-l family domain signature 1- 

CONSENSUS: W-ELIVl-x (3)-EKR(33]-x-ELIVM]|-x (2) -EflHJ-x (0t2) - 

30 CLIVMFJ-x Cb-.fi > -ELI VMFJ- 

CONSENSUS: x(3-.S)-F-EFY3-x(2)-EDENS3- 

NAME: Band M-l family domain signature 2- 

CONSENSUS: EHYldl-x (1) -EDENflSTVl-ESAJ-x (3 )-EFY3-ELIVMJ-x (2) - 

35 EACV3-x(2)-ELM3-x(2)- 

CONSENSUS: EFY3-G-x-EDEN(2ST3-ELIVMFYS3- 

NAME: Band 1-1 family domain profile. 

40 NAME: Clq domain signature- 

CONSENSUS: F-x( S) -ENDJ-x CM ) -EFYLO-x (b) -F-x (S) -G-x-Y-x-F-x- 

EFY3- 

NAME: C-terminal cystine knot signature- 
45 CONSENSUS: C-C-x(13)-C-x(2)-EGN]l-x(12)-C-x-C-x(2-.M)-C. 

NAME: C-terminal cystine knot profile- 

NAME : CUB domain profile- 

NAME : Death domain profile- 



50 



55 



NAME-' EGF-like domain signature 1. 
CONSENSUS: C-x-C-x(S)-G-x(2)-C 

NAME: EGF-like domain signature 2- 
CONSENSUS: C-x-C-x(2)-EGP3-EFYU3-x(4-.fl)-C- 
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NAME: Calcium-binding EGF-like domain pattern signature- 

CONSENSUS: EDE<2N]l-x-i:DE<2N3I (E) -C-x (3 -,14 ) -C-x C3-.7 )-C-x-EDN3- 

x(4)-EFY:D-x-C. 

5 NAME: Laminin-type EGF-like (LE) domain signature- 

CONSENSUS : C-x (1 -.E ) -C-x (5)-G-x (S) -C-x (E) -C-x (3-. 4) -EFYU3- 

x(3-.15)-C. 

NAME: Coagulation factors 5/fl type C domain (FASSC) 

10 signature 1- 

CONSENSUS: EGASI-U-x^lS^EFYbO-ELIVJ-x-ELIVFAl-EGSTDEN]]- 

x(b)-CLIVF3-x(E)-EIV]l-x- 

CONSENSUS: CLIVTH-EflKMl-G - 

15 NAME: Coagulation factors S/fl type C domain (FASflC) 

signature E- 

CONSENSUS: P-x (a-.lD)-ELt11l-R-x-EGE]l-ELIVP]l-x-G-C . 



20 



NAME: Forkhead-associated (FHA) domain profile- 

NAME: Fibrinogen beta and gamma chains C-terminal domain 

signature. 

CONSENSUS: U-U-ELIVI1FYl l |]l-x(E)-C-x(E)-EGSA]I-x(S)-N-G. 

25 NAME: Type I fibronectin domain. 

CONSENSUS: C-x(fc ifi)-CLFY3-x(S)-EFYU3-x-ERKl-x(a-.lD)-C-x-C- 

NAME: Type II fibronectin collagen-binding domain- 

30 CONSENSUS: C-x(E)-P-F-x-EFYUI]l-x(7)-C-x(fl-.lD)-U-C-x(4)- 
EDNSR3-EFYU]l-x(3-,S)-EFYU3-x- 
CONSENSUS: EFYUI3-C 

NAME: Hemopexin domain signature. 

35 CONSENSUS: ELIFAT3-x(3)-U-x(Ei3)-EPE3-x(E)-ELIVnFYl-EDEN(3S3- 
ESTA3-EAV3-ELIVMFYJ • 



40 



NAME: Kringle domain signature. 

CONSENSUS: IFY3-C-R-N-P-EDNR3I • 

NAME: (Cringle domain profile. 



NAME: LDL-receptor class A (LDLRA) domain signature* 

CONSENSUS: C-EVILI1A3-x(S)-C-El)NH3-x(3)-CDEN(3HT3-C-x(3i4)- 
45 ESTAPE3-E»EH3-EI>E3-x(l-,5)- 
CONSENSUS: C 

NAME: LDL-receptor class A (LDLRA) domain profile- 

50 NAME: C-type lectin domain signature- 

CONSENSUS: C-ELIVMFYATG3-x(SilE)-ElilL3l-x-EDNSR3-x(E)-C-x(5-.h)- 

EFYULIVSTA1-ELIVMSTA3- 

CONSENSUS: C 

55 NAME : C-type lectin domain profile. 

NAME: Link domain signature. 

CONSENSUS: C-x (IS) -A-x (3 -.4 ) -G-x (3)-C-x (E ) -G-x(6 -.D-P-x (7) -C . 
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NAME: Osteonectin domain signature 1. 

CONSENSUS : C-x-EDN3-x (2) -C-x (2 ) -G-EKRH J-x-C-x (b-.?) -P-x-C-x-C- 

x(3-,5)-C-P- 

5 

NAME: Osteonectin domain signature 2- 

CONSENSUS : F-P-x-R-EIMJ-x-D-U-L-x-ENfl J - 

NAME : Somatomedin B domain signature. 

10 CONSENSUS: C-x-C-x (3)-C-x (S) -C-C-X-EDN3-EFY3-X (3)-C 

NAME : Thyroglobulin type-1 repeat signature- 

CONSENSUS: EFYIi)HP3-x-P-x-C-x (3i l » ) -G-x-EFYIO-x (3)-<2-C-x (M ilO >- 

C-EFYU3-C-V-x(3-,M>- 
15 CONSENSUS: ESG3. 

NAME: P-type 'Trefoil' domain signature. 

CONSENSUS: R-x(2)-C-x-EFYPST3-x(3-.i»>-EST3-xC3)-C-x(iO-C-C- 
EFYWH3. 

20 

NAME: Cellulose-binding domain n bacterial type- 

CONSENSUS: ld-N-ESTAGR3-ESTDN3-ELIVM3-x<2)-EGST3-x-EGST3-x<2)- 

ELIVMFT3-EGA3 • 

25 NAME : Cellulose-binding domain-, fungal type- 

CONSENSUS: C-G-G-x(i»- I 7)-G-xC3>-C-x<5)-C-x(3vS)-ENHG3-x- 
EFYUM3-x(2)-<2-C- 

NAME : Chitin recognition or binding domain signature. 

30 CONSENSUS: C-x(M-,5)-C-C-S-x(E)-G-x-C-G-x(M)-EFYU3-C. 

NAME: Barwin domain signature 1- 

CONSENSUS: C-G-EKR3-C-L-X-V-X-N . 

35 NAME: Barwin domain signature 2- 

CONSENSUS: V-EDN3-Y-EE<33-F-V-EI>N3-C ■ 

NAME: BIR repeat- 

CONSENSUS: EHKEPILVY3-X (E)-R-x (3,7) -EFYU3-X (11 ,m> -ESTAN3-G- 

40 ELMF3-X-EFYHDA3-X(i»)- 

CONSENSUS: EDESL3-X(2-.3)-C-XC2)-C-X(fe.)-EliJA3-X(T)-H-XC4)- 
EPRSD3-X-C-XC2)-ELIVMA3. 

NAME: UAP-type ' f our-disulf ide core' domain signature- 

45 CONSENSUS: C-x~CO-EDN3-x (2) -C-x (S)-C-C 

NAME: Phorbol esters / diacylglycerol binding domain. 

CONSENSUS : H-x-ELIVMFYIi)3-xCaill>-C-x(2)-C-x<3>-ELIVMFC3- 
x(5-.10)-C-x(2)-C-x<M)-EHD3- 
50 CONSENSUS: x(2)-C-x(5-,T)-C- 

NAME: C2 domain signature- 

CONSENSUS: EACG3-x<2)-L-x(2-i3>-]>-x<l-,2)-ENGSTLIF3-EGTMR3-x- 
ESTAP3-D-EPA3-EFY3 - 



55 



NAME : C2-domain profile- 

NAME: CAP-Gly domain signature. 

-395- 



20 
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CONSENSUS : G-x (A ,10 )-CFYW J-x-G-CLIVrO-x-CLIVMFYJ-x ( H ) -G-K- 

ENHH-x-G-ICSTARl-x (E)-G- 
CONSENSUS: x(5)-CLY3-F- 

5 NAME: Ly-b / u-PAR domain signature- 

CONSENSUS: IEflR3l-C-ELIVI1FYAH]l-x-C-x(5ifi)-C-x(3->a)-IEI>N(3STV3l- 

C-CC}-x(S)-C- 

CONSENSUS: x < 1E-.E-1 ) -C - 

10 NAME: HAM domain signature* 

CONSENSUS: G-x-CLIVMFYI (2 >-x (3 >-ESTA]l-x (IDilD-CLVJ-x (•» ) - 

ELIVMFJ-x(b-.?)-C-fl:LIVPIJ-x- 

CONSENSUS: F-x-ELIVMFYH-x (3) -LCGSC!- 

15 NAME: HAP! domain profile- 

NAME : PH domain profile- 

NAME: Phosphotyrosine interaction domain (PID) profile- 

NAME: Src homology E (SHE) domain profile. 

NAME: Src homology 3 (SH3) domain profile- 

25 NAME: VUFC domain signature- 

CONSENSUS: C-xCB-.3>-C-x-C-x<b-.l-O-C-xC3-.-n-C-x<S-.10)-C- 
x( c i-.lt.)-C-C-x(E-.M)-C. 

NAME: UU/rsp5/bJlilP domain signature- 

30 CONSENSUS: W-x(Till)-EVFYJ-f FYUJ-x(b-,7)-EGSTNEJ-EGST(2CRJ-- 

EFYlO-xCE>-P- 

NAME: UU/rspS/UUP domain profile- 

35 NAME: ZP domain signature- 

CONSENSUS: ELIVMFYIO-x (7)-XSTAPDNLJ-x (3) -ELIVMFYIiD-x- 

ELIVMFYU3-x-ELIVMFYU3-x(E)-C- 

C0NSENSUS: ELIVMFYU3-x-EST3-EPSL3-x(2-.M)-IDENS3-x-ESTADN(aLF3- 
x(b)-ELIVM3(S)-x(3^)- 
40 CONSENSUS: C- 

NAME: S-layer homology domain signature. 

CONSENSUS: ELVFYTl-x-OAJ-xCE^-EDNGSATPHYJ-EWYFPDAI-xm- 
ELIV3-x(E)-EGTALV3- 
45 CONSENSUS: x(M,b)-ELIVFYC3-x(E)-G-x-EPGSTA31-x(E-,3)-EMFYA3-x- 
EPGAV3-X (3 -.10 ) -ELIVMA3- 

CONSENSUS: ESTKR3-ERY3-X-EE123-X-ESTALIVM3 - 

NAME: 'Homeobox' domain signature- 

50 CONSENSUS: ELIVMFYG]l-EASLVR3-x(E)-ELIVMSTACN]l-x-ELIVM3-x(M)- 
ELIVJ-ERKNtJESTAIY])- 

CONSENSUS: ELIVFSTNKH3-U-EFYVC]I-x-ENI>i3TAH3-x(S)-ERKNAIMU3 . 
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NAME: 'Homeobox' domain profile. 

NAME: 'Homeobox' antennapedia-type protein signature. 

CONSENSUS: ELIVMFEl-EFYl-P-U-M-EKRfiTA]] - 
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NAME: 'Homeobox' engrai led-type protein signature- 
CONSENSUS: L-M-A-(3-G-L-Y-N - 

NAME: 'Paired box' domain signature. 
5 CONSENSUS: R-P-C-x ( 11) -C-V-S - 

NAME: 'POU' domain signature 1- 

CONSENSUS: ERKiO-R-ELIMl-x-ELFI-G-ELIVMFYID-x-d-x-ONfil-V-G • 

10 NAME: 'POU' domain signature 2. 

CONSENSUS: S-fl-CSTJ-CTA3-I-ESCJ-R-F-E-x-ELSd J-x-ELIJ-CST]l . 



15 



35 



NAME: Zinc fingern C2H2 typei domain- 

CONSENSUS: C-x(2-.4)-C-x(3)-H:LIVMFYlilO-x(fl)-H-x(3-.5)-H. 

NAME: Zinc fingeri C3HCM type (RING finger)n signature. 
CONSENSUS: C-x-H-x-CLIVHFY3-C-x(5)-C-ELIVnYA]l. 



NAME: Nuclear hormones receptors DNA-binding region 

20 signature. 

CONSENSUS: C-x(2)-C-x-OE3-x(5)-CHN:B-II:FY:D-x(4)-C-x(2)-C-x(2)- 
F-F-x-R. 

NAME: GATA-type zinc finger domain* 

25 CONSENSUS: C-x-EDN3-C-x(M-,S)-EST3-x(E)-b)-IHR]|-ERK]l-x(3)-EGN]l- 
x(3-,4)-C-N-EAS3-C. 

NAME: Poly( ADP-ribose) polymerase zinc finger domain 

signature. 

30 CONSENSUS: C-IKR]l-x-C-x(3)-I-x-K-x(3)-ERG3-x(lb-.ia)-lil-EFYH3- 
H-x(5)-C 



NAME: Poly (ADP-ribose) polymerase zinc finger domain 

profile • 



NAME: Fungal Zn(2)-Cys(b) binuclear cluster domain 

signature. 

CONSENSUS : EGASTPV3-C-x(E)-C-ERKHSTACO-x(2)-ERKHl23-x(2)-C- 
x(5-.12)-C-x(2)-C-x(b-.a)- 
40 CONSENSUS: C 

NAME: Fungal Zn(2)-Cys(fc>) binuclear cluster domain profile. 

NAME: Prokaryotic dksA/traR CM-type zinc finger. 

45 CONSENSUS: C-OES:]-x-C-x(3)-I-x(3)-R-x(4)-P-x(4)-C-x(2)-C. 

NAME: Copper-fist domain signature. 

CONSENSUS: M-ELIVMF3(3)-x(3)-K-EMY3-A-C-x(2)-C-I-EKR]|-x-H- 
EKR3-x(3)-C-x-H-x(fl)- 
50 CONSENSUS: EKR3-X-EKR3-G-R-P - 

NAME: Copper fist DNA binding domain profile. 

NAME: Leucine zipper pattern. 

55 CONSENSUS: L-x(b)-L-x(b)-L-x(b)-L. 

NAME: bZIP transcription factors basic domain signature- 
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CONSENSUS: EKR3-X (1 -, 3 ) -ERKSA<33-N-x (S ) -ESAC3 ( 2 ) -x-ERKTAEN(3J-x- 

R-X-ERK3 • 

NAME: Myb DNA-binding domain repeat signature 1*. 
5 CONSENSUS: lil-EST3-x(E)-E-El>E3-x<E)-ELIV3. 

NAME: flyb PNA-binding domain repeat signature E • 

CONSENSUS : li)-x(E)-ELI3-ESAG3-x(H-.S)-R-x(fl)-EYU3-x(3>-ELIVM3. 

10 NAME: Myc-type-i 'helix-loop-helix' dimerization domain 
signature • 

CONSENSUS : E1>E1\ISTAP3-K-ELIVMUAGSN3--CFYIJCPHKR}-ELIVT3-ELIV3- 
x(E)-ESTAV3-ELIVMSTAC3-x- 

CONSENSUS: EVMFYH3-ELIVMTA3-{P3-{P:}-ELIVMSR3 . 



15 
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50 
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NAME: pS3 tumor antigen signature- 

CONSENSUS: M -r-N-S-S-C-M-G-G-M-N-R-R . 



NAME: CBF-A/NF-YB subunit signature- 
20 CONSENSUS: C-V-S-E-x-I-S-F-ELIVM3-T-ESG3-E-A-ESC3-EDE3-EKR<23- 

C 

NAME: CBF-B/NF-YA subunit signature- 

CONSENSUS: Y-V-N-A-K-<2-Y-x-R-I-L-K-R-R-x-A-R-A-K-L-E • 

25 

NAME : 'Cold-shock' I>NA-binding domain signature. 
CONSENSUS : EFY3-G-F-I-x(b-.?)-EDER3-ELIVM3-F-x-H-x-ESTKR3-x- 
ELIVMFY3- 

30 NAME : CTF/NF-I signature- 

CONSENSUS: R-K-R-K-Y-F-K-K-H-E-K-R • 

NAME: Ets-domain signature 1- 

CONSENSUS : L-EFYU3-E(JEDH3-F-ELI3-ELV(JK3-x-ELI3-L - 

NAME: Ets-domain signature E. 

CONSENSUS: ERKH3-x(2)-M-x-Y-EDEN<23-x-ELIVM3-ESTAG3-R-ESTAG3- 



ELI3-R-X-Y- 
40 NAME: Ets-domain profile. 

NAME: Fork head domain signature 1- 

C0NSENSUS: EKR3-P-EPT(33-EFYLV<2H3-S-EFY3-x(2>-ELIVM3-x<3-,i4)- 
EAC3-ELIM3 - 

NAME: Fork head domain signature E. 

CONSENSUS : U-E<2KR3-ENS3-S-ELIV3-R-H- 

NAME: Fork head domain profile- 

NAME: HSF-type DNA-binding domain signature- 

CONSENSUS : L-x(3)-EFY3-K-H-x-N-x-ESTAN3-S-F-ELIVM3-R-f2-L- 
ENH3-X-Y-X-EFYW3-ERKH3-K- 
CONSENSUS: ELIVM3- 

NAME: Tryptophan pentad repeat (IRF family) signature. 

CONSENSUS: " li)-x-EDNH3-x ( 5 ) -ELIVF3-x-EIV3-P-W-x-H-x ( T •>!□ ) -EDE3- 
x(E)-ELIVF3-F-EKCR(33-x- 
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CONSENSUS: EUR3-A- 

NAME: LIU domain signature. 

CONSENSUS: C-x (2) -C-x < 15-.21)-EFYUH3-H-x (2) -ECH3-X (2) -C-x (2 ) - 

5 C-x(3)-ELIVMF3- 

NAME: LIU domain profile- 

NAME: NF-kappa-B/Rel/dorsal domain signature- 

10 CONSENSUS: F-R-Y-x-C-E-G - 

NAME: MADS-box domain signature- 

CONSENSUS: R-x-IERK3-x<S>-I-x-EI>N3-xC3)-EKR3-x(2>-T-EFY3-x- 
ERK3(3)-x<2)-0:LIVM3-x- 
15 CONSENSUS: K (2)-A-x-E-ELIVM3-EST3-x-L-x(M ) -ILIVMJ-x- 

ILIVM3(3)-x<b)-ELIVMF3-x<2)- 
CONSENSUS: CFY3- 

NAME: MADS-box domain profile- 

20 

NAME: T-box domain signature 1- 

CONSENSUS: L-U-x (5) -EFC3-x(3iM ) -ENT3-E-M-ELIV3 (2) -T-x (5) -G- 

IRG3-EKR(33. 

25 NAME: T-box domain signature 2- 

CONSENSUS: ELIVMYU3-H-EPADH3-EDEN3-IGS3-x(3)-G-x(2)-ld-M-x<3)- 
EIVA3-X-F- 

NAME: TEA domain signature- 

30 CONSENSUS: G-R-N-E-L-I-x(2)-Y-I-x(3)-ETC3-x(3)-R-T-ERK3<2)-(3- 
ELIVM3-S-S-H-ELIVM3- 
CONSENSUS: (3-V- 

NAME: Transcription factor TFIIB repeat signature- 

35 CONSENSUS: G-EKR3-x(3)-ESTAGN3-x-ELIVMYA]KEGSTA3(2>-ECSAV3- 
ILIVM3-ELIVMFY3-ELIVMA3- 
CONSENSUS: EGSA3-ESTAC3 - 

NAME: Transcription factor TFIID repeat signature- 

40 CONSENSUS: Y-x-P-x (2 )-EIF3-x (2 )-CLIVM3 (2) -x-EKRH3-x ( 3) -P- 

ERKcO-x<3)-L-ELIVM3-F-x- 

CONSENSUS: ESTN3-G-EKR3-ELIVM3-X (3)-G-ETAGL3-0:KR3-x(7>-EAGC3- 

x(7)-ELIVM3. 

45 NAME: TFIIS zinc ribbon domain signature- 

CONSENSUS: C-x(2)-C-x(T)-ELIVM<2SAR3-E<2H3-EST<2L3-ERA3-ESACR3- 
X-OE3-EDET3-EPGSEA3- 

CONSENSUS: x(b)-C-x(2-.S)-C-x(3)-EFbl3- 

50 NAME: TSC-22 / dip / bun family signature- 

CONSENSUS: M-D-L-V-K-x-H-L-x(2)-A-V-R-E-E-V-E- 

NAME: Prokaryotic transcription elongation factors signature 

1- 

55 CONSENSUS: EST3-X ( 2) -EGS3-X (3) -ELI3-X (2) -E-L-x C2) -L-x (3-. 4 ) -R- 

x(2>-EIV3-x(3)-CLIV3- 

CONSENSUS: x(b)-G-I>-x(2)-E-N-EGSA3-x-Y. 
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NAME : Prokaryotic transcription elongation factors signature 
2- 

CONSENSUS: S-x (S)-S-P-ILIVni-EAG]l-x-CSAGl-ELIVI13-CLIVI1Yl- 

x(M)-EDGl-EDE3. 

5 

NAME: DEAD-box subfamily ATP-dependent helicases signature- 
CONSENSUS: ELIVMF3 (2) -D-E-A-D-ERKEN3-X-ICLIVMFYGSTN3 . 

NAME: DEAH-box subfamily ATP-dependent helicases signature- 
10 CONSENSUS : EGSAH3-X-ELIVMF3 (3)-D-E-CALIVJ-H-l[NECR]l . 

NAME: Eukaryotic putative RNA-binding region RNP-1 
signature- 

CONSENSUS: CRO-G-{EDRKHPCG}-CAGSCI3-ICFY3-ELIVA3-x-ILFYLM3 . 

15 

NAME: Fibrillarin signature- 

CONSENSUS: l[GST3-ELIVMAP3-V-Y-A-EIV3-E-EFY3-l[:SA3-x-R-x(2)-R- 
OEJ. 

20 NAME : MCM family signature- 

CONSENSUS: G-EIVT3-ELVAC3(2>-fl:iVT3-D-a:DE3--n:FL3-l[:DNST3. 

NAME: MCM family domain- 

25 NAME: XPA protein signature 1- 

CONSENSUS: C-x-EDE3-C-x<3)-!I:LIVMF3-x(:L-,2)-]>-x<2)-L-x(3>-F- 
x(M)-C-x(2)-C 

NAME: XPA protein signature 2- 
30 CONSENSUS: ELIVM3(2)-T-EKR3-T-E-x-K-x-OE3-Y-ELIVMF3(2)-x-I>- 
X-EDE3 ■ 

NAME: XPG protein signature 1- 

CONSENSUS: IHVI3-CKRE3-P-x-IFYIL3-V-F-l>-G-x(2)-li:PIL3-x-CLVCl- 
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NAME: XPG protein signature 2- 

CONSENSUS: CGS3-0:LIVM3-CPER3-IFYS3-ELIVM3-x-A-P-x-E-A-CI>E3- 
EPAS3-ECJS3-ECLM3- 



NAME: Bacterial regulatory proteins-i araC family signature. 

CONSENSUS: EKRfll-ELIVMAJ-x (2)-EGSTALIV3--CFYUPGl)N>-x (2) - 

ELIVMSAl-xCH-.D-ELIVMFI- 

CONSENSUS: x(2)-ELIVMSTA3-EGSTACIL3-x(3)-d:GAN<2RF:D-ELIVMFY3- 
45 xC4-.5)-IELFY3-x(3)- 

CONSENSUS: CFYIVA3-CFYUHCM>-x(3)-CGSADEN(JKR3-x-0:NSTAPKL3- 
EPARL3- 

NAME: Bacterial regulatory proteins-i araC family DNA-binding 

50 domain profile- 

NAME: Bacterial regulatory proteins-i arsR family signature. 

CONSENSUS: C-x (2) -D-ELIVM3-X (t ) -EST3-X CM ) -S-EHYRJ-EHiO - 

55 NAME: Bacterial regulatory proteins-i asnC family signature. 

CONSENSUS : EGSTAP3-X (2 ) -EDNEA3-ELIVM3-CGSA3-X ( 2) -ELIVMFY3- 

EGN3-ELIVMST3-EST3-x(b)-R- 
CONSENSUS: ELVT3-X (2 ) -ELIVM3-X ( 3) -G - 
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NAME: Bacterial regulatory proteins-, crp family signature- 

CONSENSUS: CLIVM3-CSTAG3-ERHN(j3-xC2)-CLIf13-E6A3-x-ELIVnTYA3- 
ELIVSC3-EGA3-X-ESTACN3- 
5 CONSENSUS: xC2)-IEI'IST3-x-EGSTN3-R-x-CLIVnT3-xC2)-li:LIVn'F3. 

NAME: Bacterial regulatory proteinsi deoR family signature. 

CONSENSUS : R-x (3) -ILIVfO-x (3) -ELIVM3-X (It -.17) -ESTA J-x (2)-T- 

ELIVI1A3-ERH3-EKRNA3-]>- 
10 CONSENSUS : ELIVNF3. 

NAUE: Bacterial regulatory proteins-i gntR family signature. 

CONSENSUS: ELIVAPKR3-EPILV3-x-EE(2TIVriR3-xO)-ELIVri3-x(3)- 
ELIVNFYK3-X-ELIVFT3- 
15 CONSENSUS: EDNGSTO-ERGTLV3-x-CSTAIVP3-ELIVA3-x(2)-ESTAGV3- 
ELIVMFYH3-X ( 2 ) -ELMA3 . 

NAME: Bacterial regulatory proteinsi iclR family signature. 

CONSENSUS: EGA3-X (3) -EDS3-X (2)-E-x (b)-ECSA3-ELIVri3-EGSA3- 

20 x(2)-ELIVM3-EFYH3-E]>N3. 

NAME: Bacterial regulatory proteins-i lad family signature. 

CONSENSUS: ELIVH3-x-EDEl-ELIVn3-A-x(2)-ESTAGV3-x-.V-EGSTP3- 
x<2)-ESTAG3-ELIVMA3-x(2)- 
25 CONSENSUS: ELIVI1FYAN3-ELIVnC3 • 

NAME: Bacterial regulatory proteins-i luxR family signature. 

CONSENSUS: EGDC3-x(2>-ENSTAVY3-x(2>-EIV3-l[GSTA3-x(2)- 
ELIVflFYUCTD-x-ELIVIIFYUCRJ-x^)- 
30 CONSENSUS: ENST3-ELIVn3-x(5)-IENRHSA3-ILIVI1STA3-x<2)-EICR3. 

NAME: Bacterial regulatory proteinsi lysR family signature. 

CONSENSUS: EN<3KRHSTAG3-ELIVIlFYTA3-x(2>-ESTAGLV3-ESTAG3-x(M)- 
ELIVnYCTflR3-EPSTANLVER3- 
35 CONSENSUS: x-EPSTAG<2V3-EPSTAGNVMF3-ELIVI1FA3-ESTAGH3-x<2>- 
ELIVMF3-x(2)-ELIVI1FliJ3- 

CONSENSUS: ERKEAV3-x(2>-ILIVMFYNTAE3-x(3)-ELIf1VT3. 

NAME: Bacterial regulatory proteinsi marR family signature. 

40 CONSENSUS: ESTNA3-ELIA3-x-ERNGS3-x(»D-ELI13-EEIV3-x(2)-EGES3- 
CLFYU3-ELIVC3-x(7)- 

CONSENSUS: EDN3-ERK(3G3-IRK3-x (b) -T-x (2) -EGA3 - 

NAME: Bacterial regulatory proteins-i merR family signature. 

45 CONSENSUS: EGSA3-X-ELIVMFA3-EASM3-X (2>-CSTACLIV3-EGSDEN(2R3- 

ELIVC3-ESTANHO-x(3>- 

CONSENSUS: EI_IVM3-CRHF3-x-EYIiJ3-E]>Eia3-x(2i3)-EGHDNfl3- 
ELIVMF3(2) . 

50 NAME: Bacterial regulatory proteinsi tetR family signature. 

CONSENSUS: G-ELIVI1FYS3-x(2-.3)-ETS3-ELIVI1T3-x(2)-ELIVn3-x(S)- 
ILIV<3S3-ESTAGENdHJ-x- 

CONSENSUS: EGPAR3-X-ILIVMF3-EFYST3-X-EHFY3-EFVJ-X-EDNST3-K- 
x(2)-ELIVI13. 

55 

NAME: Transcriptional antiterminators bglG family signature. 

CONSENSUS: EST3-x-H-x<2)-EFA3(2>-ELIV[13-EE<2K3-R-x<2>-E<2NK3. 
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NAME: Sigma-51 factors family signature 1- 

CONSENSUS: P-ELIVMH-x-ELIVMH-x (2) -ELIVMH-A-x (S) -ELIVMFJ-x (2) - 

EHSH-x^S-T-ELIVMH-S-R . 

5 NAME: Sigma-S 1 ! factors family signature 2 - 

CONSENSUS: R-R-T-IIVl-EATJ-K-Y-R - 

NAME: Sigma-5 1 ! factors family profile. 

10 NAME: Sigma-7D factors family signature 1. 

CONSENSUS: EDE3-ELIVMF3 ( 2 ) -EHEfiSJ-x-G-x-ELIVMFAl-G-L- 

ELIVMFYED-x-EGSAMID-ELIVMAPll. 

NAME: Sigma-7D factors family signature 2- 

15 CONSENSUS: ESTN3-x(2)-OEfl3-ELIVn3-CGAS3-x^)-ELIVHF:]]-EPSTG:D- 
x(3)-ELIVMA]]-x-ENl3RJ- 

CONSENSUS: ELIVMAI-EEflhU-x ( 3) -ELIVMFliO-x (2 ) -ILIVMJ • 

NAME: Sigma-7D factors ECF subfamily signature- 

20 CONSENSUS: ESTAIVJ-EPflDELJ-EDEl-ELIVl-ELIVTAll-CJ-x-ESTAVl- 
ELIVMFYC3-ELIVMAK3-X- 

CONSENSUS: EGSTAIVl-ILIMFYUiO-xClS-.l'O-ESTAPll-EFYIiO-ELIFll- 
x(2)-EIV3- 

25 NAME: Sigma-S 1 * interaction domain ATP-binding region A 

signature- 

CONSENSUS: ELIVMFYH (3) -x-G-EDEfll-ESTEJ-G-ESTAVl-G-K-x ( 2) - 

ELIVMFY3 - 

30 NAME: Sigma-SI interaction domain ATP-binding region B 

signature - 

CONSENSUS : EGSH-x-ELIVMFH-x (2 ) -A-EDNEflASHl-EGNEO-G-ESTIMH- 

ELIVIIF Y J < 3 > -IDE J-EEO- 
CONSENSUS: ELIVfll- 

35 

NAME: Sigma-5 1 ! interaction domain C-terminal part signature- 

CONSENSUS: EFYIO-P-EGSH-N-ELIVMH-R-EEdl-L-x-ENHATll - 

NAME: Sigma-SH interaction domain profile- 

NAflE: Single-strand binding protein family signature 1- 

CONSENSUS: ELIVMF3-ENST]l-EKRT]l-ELIVM3-x-ILIVHFni(2)-G-ENHRKl- 
ELIVMI-EGSTl-x-EDETl- 

45 NAME: Single-strand binding protein family signature 2- 

CONSENSUS: T-x-U-EHY3-ERNSll-ELIVri3-x-ELIVi1F3-EFY3-ENGKR]l - 

NAME: Bacterial histone-like DNA-binding proteins signature- 

CONSENSUS: EGSKl-F-x(2)-ELIVMF]|-x(H)-ERKE(2A3-x(2)-ERST]I-x- 
50 EGAJ-x-EICNI-P-x-T • 

NAME: Dps protein family signature 1- 

CONSENSUS: H-EFItl3-x-ELIVn3-x-G-x(S)-ELVJ-H-x(3)-EDE3- 

55 NAME: Dps protein family signature 2- 

CONSENSUS: ELIVMFYJ-EDHH-x-ILIVMH-EGAJ-E-R-x (3)-ELIF3-EGDN3- 

x(2)-EPA3- 
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NAME: DNA repair protein radC family signature. 

CONSENSUS: H-N-H-P-S-G- 

NAME: recA signature- 

5 CONSENSUS: A-L-EKRJ-OFJ-EFY3-CSTAJ-ISTAD J-ELIVfldl-R . 

NAME: RecF protein signature 1- 

CONSENSUS: P-EED3-X (3) -ELIVM3 (2 ) -X-G-EGSAD3-P-X (5) -R-R-x- 
EFY3-CLIVM3-D. 

10 

NAME: RecF protein signature 2- 

CONSENSUS: li:LIVMFY3<2)-x-D-x(2-.3)-ICSA3-EEH3-L-D-x(2)-EKRH3- 
x(3)-L- 

15 NAME: RecR protein signature- 

CONSENSUS: C-x(2)-C-x(3>-EST3-x(i»)-C-x-I-C-x<M)-R. 

NAUE: Histone H2A signature. 

CONSENSUS: EAO-G-L-x-F-P-V- 

20 

NAME: Histone H2B signature- 

CONSENSUS: ILKR3-E-ELIVM3-EE<23-T-x (2) -EKR3-X-ILLIVM3 (2 > -x- 

EPAG3-EDE3-L-X-EKR3-H-A- 

CONSENSUS: ELIVM3-ESTA3-E-G - 

25 

NAME: Histone H3 signature !• 

CONSENSUS: K-A-P-R-K-fl-L • 

NAME: Histone H3 signature 2- 

30 CONSENSUS: P-F-x-ERA3-L-CVA3-EKRl33-EDEG3-ElV3 - 

NAME: Histone HM signature- 

CONSENSUS: G-A-K-R-H- 

35 NAME: HMG1/2 signature- 

CONSENSUS: EFI3-S-CKR3-K-C-S-EEK3-R-U-K-T-M - 

NAME: HMG-I and HMG-Y DNA-binding domain (A+T-hook)- 

CONSENSUS: EAT3-x(1-i2)-ERO<2>-Q:GP3-R-G-R-P-ERK3-x. 

40 

NAME: HMGm and HMG17 signature. 

CONSENSUS: R-R-S-A-R-L-S-A-ERKIB-P . 

NAME: Bromodomain signature. 

45 CONSENSUS: CSTANVF3-x(2)-F-x(M)-EDNSJ-x(5-.7)-EDEN(2TF3-Y- 
EHFY3-x(2)-ELIVMFY3-x(3)- 

CONSENSUS: ELIVM3-x^)-CLIVM3-x(bnB)-Y-xC:L2-.:L3)-inLIVM]|-x(2> 
N-ISACF3-x(2)-EFY3. 

50 NAME: Bromodomain profile- 

NAME: Chromo domain signature- 

CONSENSUS: EFYL J-x-ELIVMCJ-EKRl-U-x-EGDNRl-EFYIilLEl-x (S-.b)- 
EST3-lil-EES3-EPSTDN3-x(3)- 

55 CONSENSUS: ELIVMC3 • 

NAME: Chromo and chromo shadow domain profile- 
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NAME: Regulator of chromosome condensation (RCC1) signature 
1. 

CONSENSUS: G-x-N-D-x (2 ) -EAV3-L-G-R-X-T • 

5 NAME: Regulator of chromosome condensation (RCC1) signature 

CONSENSUS: ELIVt1FA3-ESTAGCl(S)-G-x(E)-H-ESTAGLI3-IELIVnFAJ-x- 
ELIVMJ- 

10 NAME: Protamine PI signature. 

CONSENSUS: CAVJ-R-ENFYJ-R-x(2-,3)-EST3-x-S-x-S- 



15 



NAME: Nuclear transition protein 1 signature- 
CONSENSUS: S-K-R-K-Y-R-K . 

NAME: Nuclear transition protein 2 signature 1 
CONSENSUS: H-x (3) -H-S-ENS3-S-x-P-fl-S - 



NAME: Nuclear transition protein B signature 2- 

20 CONSENSUS: K-x-R-K-x (2 ) -E-G-K-x (2) -K-EKR3-K . 

NAME: Ribosomal protein LI signature- 

CONSENSUS: EIM3-X (2)-ELIVA3-x (2 i3) -ELIVMJ-6-x (2) -ELHS3- 

EGSNHH-EPTKRH-EKRAVI-G-x- 
25 CONSENSUS: ELMFil-P-EDEN.STO . 

NAME: Ribosomal protein L2 signature- 

CONSENSUS: P-x(2)-R-G-CSTAIV3(2)-x-N-EAPK3-x-EDE]|. 

30 NAUE: Ribosomal protein L3 signature* 

CONSENSUS: EFLl-x (b) -EDN3-X (2) -EAGSJ-x-ESTI-x-G-EKRHl-G-x (2)- 

G-x(3)-R. 

NAME: Ribosomal protein LS signature. 

35 CONSENSUS: ELIVMJ-x<2)-ELIVM:B-ESTAC:D-EGE:D-E<2V:D-x(2)-ELIVMA3- 
X-ESTC3-X-ESTAG3-EKR3- 
CONSENSUS: x-ESTAJ. 

NAUE: Ribosomal protein Lb signature !• 

40 CONSENSUS: EPS3-EDENS3-X-Y-K-EGA3-K-G-ELIVI13 . 

NAME: Ribosomal protein Lb signature 2- 

CONSENSUS: <2-x < 3) -ELIVMD-x (2) -EKRl-x (2 ) -R-x-F-x-D-G-ELIVMID-Y- 

ELIVM3-x(2)-EKR3. 

45 

NAUE: Ribosomal protein LT signature- 

CONSENSUS: G-x (2) -EGN3-X (M) -V-x (2) -G-EFYJ-x (2) -N-EFYJ-L-x (5) - 

EGA3-x(3)-ESTNl- 

50 NAME: Ribosomal protein LID signature- 

CONSENSUS: EDEHJ-x (2) -EGSJ-ELIVMFH-ESTNll-.EVAJ-x-EDEflia- 

ELIVHA3-x(2)-ELH13-R. 

NAME: Ribosomal protein Lll signature- 

55 CONSENSUS: ERKNI-x-ELIVMl-x-G-ESO-x (2 ) -ESNdl-ELIVPD-G-x (2) - 

ELIVri3-x(D-,l)-El>ENG3. 

NAME : Ribosomal protein L13 signature. 
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CONSENSUS: li:LIVM3-IEKRV3-EGK3-M-B:LIV3-li:PS3-x ( 1 ■. 5 > -IEGS3- 

CN(3EKRAl-x(S)-l[LIVriJ-x-CAIV]l- 

CONSENSUS: ELFY3-X-EGDN3 - 

5 NAME: Ribosomal protein L1H signature- 

CONSENSUS: EGA3-IELIV3 ( 3 ) -x ( ID ) -ONS3-G-X ( M ) -EFY3-X C2> -ENT3- 

x(2)-V-ELIV3- 

NAME: Ribosomal protein LIS signature- 

10 CONSENSUS: K-ELIVM3(2>-EGAL3-x-EGT3-x-ELIVMA3-x<2-,S)-ELIVM3- 
x-ELIVMF3-x(3->M)- 

CONSENSUS: ELIVMFC3-EST3-x(2)-A-xC3)-ELIVM3-x(3)-G- 

NAME: Ribosomal protein Lit signature 1- 

15 CONSENSUS: EKR3-R-x-EGSAC3-EK(2VA3-ELIVM3-li)-ELIVM3-EKR3- 
ELIVM3-ELFY3-EAPJ • 
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NAME: Ribosomal protein Lib signature 2- 

CONSENSUS: R-M-G-x-EGR3-K-G-x(i»>-EFli)KR3. 

NAME: Ribosomal protein L17 signature- 

CONSENSUS: I-x-EST3-E6T3-xC2>-EKR3-x-l<:-x(b>-EDE3-x-ELIMV3- 
ILIVMT3-T-X-ESTAG3-EKR3- 

25 NAME: Ribosomal protein LIT signature- 

CONSENSUS: ERT3-EKRSVY3-EGSA3-X-V-ERS3-EKR3-ESA3-K-L-Y-Y-L-R • 

NAME: Ribosomal protein L2D signature- 

CONSENSUS: K-x (3>-EKRC3rx-ELIVM3-U-EIV3-ESTNALV3-R-ELIVM3-N- 

30 xC3)-ERKH3- 

NAME: Ribosomal protein L21 signature- 

CONSENSUS: EIVT3-X ( 3) -EKR3-X (3 ) -EKR(J3-K-x (b ) -G-EHF3-R-ERi23- 

x(2)-T- 

35 

NAME: Ribosomal protein L22 signature- 

CONSENSUS: ERK<3N3-x ( H >-ERH3-EGAS3-x-G-EKRdS3-x (^)-CHDNl- 

ELIVM3-x-ELIVMS3-x-ELIVM3 . 

40 NAME: Ribosomal protein L23 signature- 

CONSENSUS: ERK3 (2)-EAM3-EIVFYT3-EIV3-ERKT3-L-ESTANt2K3-x (7 )- 

ELIVMFT3- 

NAME: Ribosomal protein L2M signature- 
45 CONSENSUS: EGDEN3-D-x-V-x-EIV3-ELIVMA3-x-G-x<2)-EICA:D-EGN3- 
x(2-,3)-EGA3-x-EIV3- 

NAME: Ribosomal protein L27 signature- 
CONSENSUS: G-x-ELIVMJ (2) -x-R-<2-R-G-x (S) -G- 

NAME: Ribosomal protein L21 signature- 

CONSENSUS: EKN(2S3-EPSTL3-x (2>-ELIMFA3-EKRGSAN3-x-ELIVYSTA3- 

EKR3-EKRH3-EDESTANRL3- 

C0NSENSUS: ELIV3-A-EKRCI3VT3-ELIVMA3 - 

NAME: Ribosomal protein L3D signature- 

CONSENSUS: EIVT3-ELIVM3-xC2)-ELF3-x-ELI3-x-EKRHl3EG3-x(2>- 
ESTN(2H3-x-EIVT3- 
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CONSENSUS: x ( ID) -ELMSI-ELIVl-x (E ) -ELIVAJ-x (E) -ELMFY3-EIVT]! - 

NAME: Ribosomal protein L31 signature- 

CONSENSUS: H-P-F-EFYJ-ETIJ-xC'D-G-R-EAVl-x-EKRID. 

5 

NAME: Ribosomal protein L33 signature. 

CONSENSUS: Y-x-EST31-x-CICRl-ENS3-x(M)-CPAT3-x(l-,S)-ELIVI1]l- 
EEA3-x(E)-K-EFY3-ECSDJ. 

10 NAME: Ribosomal protein signature. 

CONSENSUS: K-ERGD-T-EFYULl-EEflSll-x (5)-EKRHS3-x(4-i5)-G-F-x(E)- 

R- 

NAME: Ribosomal protein L3S signature. 

15 CONSENSUS: ELIVH3-K-ETV3-x(E)-EGSA3-ESAIL3-x-K-R-ELIVMFY:D- 
EKRLJ • 

NAME: Ribosomal protein L3b signature. 

CONSENSUS: C-x(E)-C-x<E)-ELIVM:D-x-R-xC3)-ELIVMN:D-x-ELIVMJ-x- 
20 C-x(3^)-IKR]l-H-x-<a-x-(3. 

NAME: Ribosomal protein Lie signature. 

CONSENSUS: N-x(3)-EKR3-x(E)-A-ELIVT3-x-S-A-ELIV3-x-A-EST]l- 
ESGAJ-xC7)-ERO-G-H. 



25 



NAME: Ribosomal protein Lbe signature* 

CONSENSUS: N-x(S)-P-L-R-R-x(4)-EFY3-V-I-A-T-S-x-K. 



NAME: Ribosomal protein L7Ae signature. 
30 CONSENSUS: ECAJ-x<4)-EIV:D-P-EFY3-x<E)-ELIVMJ-x-EGS<0-EKR£j:D- 
x(E)-L-G. 

NAME: Ribosomal protein LlOe signature* 
CONSENSUS: R-x-A-EFYUJ-G-K-EPAD-x-G-x (£ ) -A-R-V • 

35 

NAME: Ribosomal protein L13e signature* 

CONSENSUS: EKR3-Y-x(E)-K-ELIVM]l-R-ESTA3-G-IKR]l-G-F-EST]l-L-x- 
E. 

40 NAME: Ribosomal protein LISe signature. 

CONSENSUS: CDE3-EKR3-A-R-x-L-G-EFY3-x-ESAPl-x(E)-G- 
ELIVMFYJ(4)-R-x-R-V-x-R-G* 

NAME: Ribosomal protein Llfie signature* 
45 CONSENSUS: EKREJ-x-L-x ( E)-EPS3-EKR3-x (E) -ERH3-EPSA3-X-ELIVM3- 

ENSJ-ELIVIU-x-ERO- 
CONSENSUS: ELIVMJ* 

NAME: Ribosomal protein LlTe signature- 
50 CONSENSUS: R-x-EKR3-x(S)-EKR]l-x(3)-EKRHl-x(E)-G-x-G-x-R-x-G- 
x(3)-A-R-x(3)-EK(33- 

CONSENSUS: x(E)-U-x(7)-R-x(E)-L-x(3)-R. 

NAME: Ribosomal protein LSle signature. 

55 CONSENSUS: G-EDE]I-x-V-x(1D)-EGV]|-x(E)-EFYH3-x(E)-EFY]1-x-G-x- 
T-G- 

NAME: Ribosomal protein LEIe signature- 
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CONSENSUS: EFYJ-x-EGSJ-x ( E) -CI VH-x-P-G-x-G-x ( £ ) -EFYVJ-x- 
EKRHEl-x-D- 

NAME: Ribosomal protein LE7e signature. 

CONSENSUS: G-K-N-x-U-F-F-x-K-L-R-F> - 

NAME: Ribosomal protein L3Qe signature 1. 

CONSENSUS: ESTAJ-x ( 5>-G-x-El2KR:D-x<E >-ELIVMJ-EK<2T:il-x (2) -EKRID- 
x-G-x(E)-K-x-ELIVM:D(3) - 

NAME: Ribosomal protein L30e signature 2- 

CONSENSUS: EDE3-L-G-[STA]-x(2)-G-CKR3-x<b)-CLIVHJ-x-ILIVn:D-x- 
EDEND-x-G- 

15 NAME: Ribosomal protein L31e signature- 

CONSENSUS: V-EKR3-ELIVM3-x(3)-ELIVM:D-N-x-EAO-x-IiI-x-EKR:D-G. 

NAME : Ribosomal protein L3Se signature- 

CONSENSUS: F-x-R-x m-EKRJ-x CE)-EKR])-ELIVM3-x(3)-li)-R-EKR3- 
20 x(E)-G- 

NAME: Ribosomal protein L3 l 4e signature- 

CONSENSUS: Y-x-ESTJ-x-S-ENY3-x CS) -EKR3-T-P-G - 

25 NAME: Ribosomal protein L35Ae signature- 

CONSENSUS: G-K-ELIVM3-x-R-x-H-G-x<E)-G-x-V-x-A-x-F-x(3)-ELI3- 
P. 

NAME: Ribosomal protein L3be signature- 

30 CONSENSUS: P-Y-E-EKR3-R-x-ELIVM3-E]>E3-ELIVM3(£)-EKR3. 

NAME: Ribosomal protein L37e signature- 

CONSENSUS: G-T-x-ESA3-x-G-x-EKR3-xC3)-EST3-x(D-.:L)-H-x<£)-C-x- 
R-C-G- 

NAME: Ribosomal protein LSTe signature- 

CONSENSUS: EKRAI-T-x^-ELIVrO-EKRfiFJ-x-ENHSS-x^-R-ENHYl-li!- 
R-R- 



35 



40 NAME: Ribosomal protein mie signature- 

CONSENSUS: K-x-ETV3-K-K-x (E) -L-EKR3-X (E) -C . 

NAME: Ribosomal protein SE signature 1- 

CONSENSUS: ELIVMFA3-x(E)-ELIVI1FYC3(E)-x-ESTAC:D-EGSTAN<aEKR3- 
45 ESTALV3-EHY3-ELIVMF3-G- 

NAME: Ribosomal protein SE signature S- 

CONSENSUS: P-xC2)-ELIVMF3(2)-ELIVMS3-x-EGDNJ-x(3)-E]>ENL3- 
x(3)-ELIVM3-x-E-xCO- 
50 CONSENSUS: EGN(2KRH3-ELIVM3-EAPJ- 

NAME: Ribosomal protein S3 signature- 

CONSENSUS: EGSTA3-EKR3-x(b)-G-x-ELIVMT3-x(2)-EN<2SCH3-x<l-,3)- 
ELIVFCA3-x(3)-ELIV3- 
55 CONSENSUS: EDENfl3-x(7)-ELMT3-x(E)-G-x(E)-G- 

NAME : Ribosomal protein SM signature- 
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CONSENSUS: ELIVM3-IEDE3-X-R-L-X (3 ) -ELIVMC3-IEVMF YH<23-|[KRT3- 

x(3)-ESTAGCF3-x-[EST3-x(3)- 

CONSENSUS: ESAI3-EKR3-X-ELIVMF3 (E) • 

NAME: Ribosomal protein S5 signature- 

CONSENSUS: G-CKRfl]l-x(3)-0:FY]l-x-EACV3-x CS)-l!:LIVMA3-li:LIVM3- 

EAG3-ON3-X (E)-G-x- 

CONSENSUS: IELIVM3-G-x-i:SAG3-xC5-.t.)-OE(33-a:LIVM3-x(£)-A- 
ELIVMF3- 

NAME: Ribosomal protein St signature- 

CONSENSUS : G-x-ICKRC3-CDEN(2RH]l-L-fl:SA3-Y-x-I-CKRNSAl . 



NAME : Ribosomal protein S7 signature- 

15 CONSENSUS: OENSK3-x-ELIVMET3-x(3)-B:LIVMFT3(E>-x(b>-G-K-n:KR3- 
x(5)-CLIVnF3-CLIVI1FCl- 
CONSENSUS: x(E)-CSTA3- 

NAME: Ribosomal protein Sfl signature- 

20 CONSENSUS: EGE3-X (E ) -ELIV3 (E) -ESTY3-T-X (E )-6-IELIVM3 (E ) -x (1 ) - 

IAG3-EKRHAYI3- 

NAME: Ribosomal protein SI signature- 

CONSENSUS: G-G-G-x(E)-EGSA3-<2-x(E)-ESA3-x(3)-EGSA3-x-EGSTAV3- 
25 EKR3-EGSAL3-ELIF3- 

NAME: Ribosomal protein S1Q signature- 

CONSENSUS: EAV3-x(3)-EGDNSR3-ELIVMSTA3-x (3)-G-P-ELIVM3-x- 

ELIVM3-P-T- 

30 

NAME : Ribosomal protein Sll signature- 

CONSENSUS: ELIVMF3-X-EGSTAC3-ELIVMF3-X C£) -EGSTAL3-X (D-,1) - 

EGSN3-ELIVMF3-X-ELIVM3- 

CONSENSUS: x(iO-EI>EN3-x-T-P-x-EPA3-ESTCH3-E]>N3 • 

35 

NAME: Ribosomal protein SIS signature- 

CONSENSUS: ERK3-X-P-N-S-EAR3-X-R - 

NAME: Ribosomal protein S13 signature- 

40 CONSENSUS: EKR(JS3-G-x-R-H-x(E) -EGSNH3-X (E) -ELIVMC3-R-G-<2 • 

NAME: Ribosomal protein Sll signature- 

CONSENSUS: ERP3-X <□■.].) -C-x(ll-,lE)-ELIVMF3-x-ELIVMF3-ESC3- 

ERG3-x(3)-ERN3- 

45 

NAME: Ribosomal protein S15 signature- 

CONSENSUS : ELIVM3-x(E)-H-ELIVMFY3-x(5)-D-x<E)-ESAGN3-x(3>- 

ELF3-x(T)-ELIVM3-x(£)- 

CONSENSUS: EFY3- 

50 

NAME: Ribosomal protein Sib signature- 

CONSENSUS: ELIVMT3-X-ELIVM3-EKR3-L-ESTAKJ-R-X-G-EAKR3 • 

NAME: Ribosomal protein S17 signature- 

55 CONSENSUS: G-D-x-ELIV3-x-ELIVA3-x-E(2EK3-x-ERK3-P-ELIV3-S - 



NAME: Ribosomal protein SIS signature- 
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CONSENSUS: li:iV3-li:]>Y3-Y-x(2>-ELIVMT3-x(2>-ELIVM3-x<2)-tFYT3- 
ELIVM3-EST3-EDERP3-X- 

CONSENSUS: EGY3-K-ILLIVM3-X ( 3 ) -R-0ILIVMAS3 . 

5 NAME: Ribosomal protein S11 signature- 

CONSENSUS: ESTDN<33-G-EKR(2M3-x (t^-ELIVMS-xCl )-ILIVM3-EGSD3- 

x(2)-ELF3-EGAS3-E])E3-F- 

CONSENSUS: x(2)-IST3- 

10 NAME: Ribosomal protein S21 signature. 

CONSENSUS : OE3-x-A-l[LY3-tKR3-R-F-K-EKR3-x(3>-li:KR3. 



15 



30 



40 



NAME: Ribosomal protein S3Ae signature- 
CONSENSUS : CLIV3-x-EGH]l-R-EIV3-x-E-x-ISC3-L-x-I>-L . 

NAME: Ribosomal protein SMe signature- 

CONSENSUS: H-x-K-R-ELIVM3-ESAN3-x-P-x (2)-bJ-x-!:LIVM3-x-EKR3. 



NAME: Ribosomal protein Sbe signature- 

20 CONSENSUS: ELIVM3-ESTAMR3-G-G-x-]>-x(2>-G-x-P-M. 

NAME: Ribosomal protein S7e signature- 

CONSENSUS: EKR3-L-X-R-E-L-E-K-K-F-ESAP3-X-EKR3-H ■ 

25 NAME: Ribosomal protein S6e signature- 

CONSENSUS: R-x(2)-T-G-EGA3-x(S)-EHR3-K-CKR3-x-K-x-E-ELM3-G. 



NAME: Ribosomal protein S12e signature- 

CONSENSUS: A-L-EKR<2P3-x-V-L-xC2>-ESA3-x(3)-EI>NJ-G-l_. 

NAME: Ribosomal protein S17e signature- 

CONSENSUS: A-X-I-X-CST3-K-X-L-R-N-EKR3-I-A-G-IFY3-X-T-H ■ 



NAME: Ribosomal protein SlTe signature- 

35 CONSENSUS: P-x(b)-CSAN3-x(2)-ELIVMA3-x-R-x-IALIV3-ELV3-t2-x-L- 
EE(23- 



NAME: Ribosomal protein S21e signature- 
CONSENSUS : L-Y-V-P-R-K-C-S-ESA3 . 

NAME: Ribosomal protein S24e signature. 

CONSENSUS: EFA3-6-xC2)-EKR3-ESTA3-x-G-EFY3-EGA3-x-ELIVM3-Y- 
EDN3-ESN3. 



45 NAME: Ribosomal protein S2be signature- 

CONSENSUS: EYH3-C-V-S-C-A-I-H - 

NAME: Ribosomal protein S27e signature. 

CONSENSUS: E(2K3-C-x(2)-C-x(b)-F-EGS3-x-EPSA3-x(S)-C-x(2)-C- 
50 EGS3-x(2)-L-x(2)-P-x-G- 

NAME: Ribosomal protein S2fle signature- 

CONSENSUS: E-EST3-E-R-E-A-R-X-L- 

55 NAME: DNA mismatch repair proteins mutL / hexB / PMS1 

signature- 

CONSENSUS: ■ G-F-R-G-E-A-L - 
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NAME: DNA mismatch repair proteins mutS family signature- 

CONSENSUS: ESTl-ELIVMl-x-ELIVMl-x-D-E-ELIVMYl-EGa-ERKHl-G- 
EGSTJ-xm-G. 

5 NAME: mutT domain signature. 

CONSENSUS: G-x(S)-E-x(M)-BISTAGC]l-CLIVI1AC3-x-R-E-0:LIVriFT3I-x-E- 
E- 

NAME: DnaA protein signature. 

10 CONSENSUS: I-CGAJ-x(2)-ELIVMFJ-ESGDNKJ-xC0t1)-EKR3-x-H-CSTP:d- 
ESTV3-ELIVM3(5)-x- 

CONSENSUS: CSA3-x(2)-EKRE]l-CLIVMl. 

NAME: Smalli acid-soluble spore proteins-i alpha/beta type-. 

15 signature 1* 

CONSENSUS: K-x-E-ELIVJ-A-x-EDEni-ELIVMFl-G-ELIVMFl. 

NAME: Smalli acid-soluble spore proteinsn alpha/beta typen 

signature H- 

20 CONSENSUS: CKR3-CSA(23-x-G-x-V-G-G-x-ELIVMJ-x-CKR]|(E)- 
ELIVMJC2) - 



25 



NAME: Zinc-containing alcohol dehydrogenases signature- 
CONSENSUS: G-H-E-x<2)-G-x(5)-EGA:D-xC2>-EIVSAC:D. 

NAME: fluinone oxidoreductase / zeta-crystallin signature. 
CONSENSUS: EGSDJ-EDEflHl-x (2) -L-x ( 3) -ISA 1 (2 )-G-G-x-G-x ( 4 ) -<2- 

x(2)-EKR3- 



30 NAME: Iron-containing alcohol dehydrogenases signature 1- 

CONSENSUS: ESTALIV]]-ILIVF3-x-EI>E:D-xCb-.7)-P-x(4)-EALIV:D-x- 
EGST3-X ( 2) -D-ETAIVM3- 
CONSENSUS: ELIVMFJ-x(M)-E. 

35 NAME: Iron-containing alcohol dehydrogenases signature 2. 

CONSENSUS: EGSUl-x-ELIVTSACDJ-EGHI-x^-EGSAEJ-EGSHYO-x- 
ELIVTP3-EGAST3-EGAS3-x(3)- 

CONSENSUS: ELIVMT3-x-CHNS]l-EGAni-x-EGTAC3 . 

40 NAME: Short-chain dehydrogenases/reductases family 

signature. 

CONSENSUS : CLIVSPADNK3-x(12)-Y-EPSTAGNCV]l-ESTAGNaCIVM3- 
ESTAGCJ-K-{PC}-ESAGFR3- 

CONSENSUS: ELIVMSTAGD3-x(2)-CLIVMFYIi)ni-x(3)-ELIVMFYIilGAPTH(33- 
45 EGSACflRHM]]. 

NAME: Aldo/keto reductase family signature !• 

CONSENSUS: G-EFY]l-R-l[HSAL3-ELIVMF]I-D-ESTAGC3-EAS3l-x<S)-E- 
x(2)-ELIVM3-G- 

50 

NAME: Aldo/keto reductase family signature 2- 

CONSENSUS: ELIVMFYJ-xCD-EKREflrD-x-ELIVMH-G-ELIVMJ-ESa-N- 

EFY3- 

55 NAME: Aldo/keto reductase family putative active site 

signature. 

CONSENSUS: ELIVM3-EPAIVll-EKR3-EST]l-x(M)-R-x(2)-EGSTAEflK3- 
ENSL3-x(2)-ELIVMFA3. 
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NAME: Homoserine dehydrogenase signature- 

CONSENSUS: A-x(3)-G-0:LIVMFY3-|]:STAG:il-x(2-.3)-ONS:D-P-x(2)-D- 
CLIVI13-x-6-x-D-x(3)-K. 

5 

NAME: NAD-dependent glycerol-3-phosphate dehydrogenase 

signature. 

consensus: g-eatj-clivmj-k-on j-clivm1 (s ) -a-x-igal-x-g- 
cliviif3-x-d:i>e]i-g-d:liviii-x- 
10 consensus: elivmfyui-g-x-n • 

NAME: FAD-dependent glycerol-3-phosphate dehydrogenase 

signature 1- 

CONSENSUS: CIV:D-G-G-G-x(2)-G-ESTACV3-G-x-A-x-D-x(3)-R-G. 

15 

NAME: FAD-dependent glycerol-3-phosphate dehydrogenase 

signature E- 

CONSENSUS : G-G-K-x (2) -EGSTEH-Y-R-x (2) -A- 

20 NAME: Mannitol dehydrogenases signature- 

CONSENSUS: ELIVMY3-x-EFS:D-x(2)-ESTAGCV:D-x-V-D-R-EIV:i-x-EPS:i. 

NAME: Histidinol dehydrogenase signature- 

CONSENSUS: :i>D-x(2>-A-G-P-EST:D-E-ELIVS]i-ELIVMA:D(3)-EAC]]-x(3)- 
25 A-x(M)-CLIVI11-CAV]l- 

CONSENSUS: CSACL3-EI>E3-ELIVnFC3-0:LIVI13-CSAl-x(2)-E-H. 

NAME: L-lactate dehydrogenase active site- 

CONSENSUS: ELIVMAl-G-EEfll-H-G-EDNJ-ESTJ - 

30 

NAME: D-isomer specific 2-hydroxyacid dehydrogenases NAD- 

binding signature- 

CONSENSUS: ELIVMAl-EAGJ-EIVTJ-ELIVMFYl-EAGID-x-G-ENHKRflGSACID- 
CLIV3-G-x(13-.m)- 
35 CONSENSUS: ELIVf MT3-X (2 ) -EFYwCThO-EDNSTKH - 

NAME: D-isomer specific 2-hydroxyacid dehydrogenases 

signature 2- 

CONSENSUS: ELIVnFYUA3-ELIVFYUC3-x(2)-ICSAC3-0:DN(3HR3-CIVFA3- 
40 ELIVFI-x-ELIVFl-EHNIJ-x- 

CONSENSUS: P-x(H)-ESTN:D-x(2)-ELIVMF]I-x-EGSDN:D. 

NAME: D-isomer specific 2-hydroxyacid dehydrogenases 

signature 3- 

45 CONSENSUS: ELMFATO-EKPdH-x-EGSTDNll-x-ELIVnFYIilRI- 
ELIVI1FYU3(2)-N-x-CSTAGC3-R-D:GP3-x- 
CONSENSUS: ELIVHU-ELIVMO-EDNVJ - 

NAME: 3-hydroxyisobutyrate dehydrogenase signature- 

50 CONSENSUS: ILIVMFYJ^-G-L-G-x-EMflJ-G-x-EPGSJ-EMAH-ESA]!. 

NAME: Hydroxymethylglutaryl-coenzyme A reductases signature 
1. 

CONSENSUS: ERKHH-x (b) -D-x-M-G-x-N-x-ELIVMAJ- 

55 

NAME: Hydroxymethylglutaryl-coenzyme A reductases signature 
2- 

CONSENSUS: ELIVMI-G-x-ELIVMl-G-G-EAGI-T - 
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NAME: Hydroxymethylglutaryl-coenzyme A reductases signature 
3- 

CONSENSUS: A-ILIVM3-x-ESTAN3-x (2)-ELI3-x-EKRN<23-EGSA3-H-li:LM3- 

5 X-IFYLH3 • 

NAME: Hydroxymethylglutaryl-coenzyme A reductases profile- 

NAME: 3-hydroxyacyl-CoA dehydrogenase signature- 
10 CONSENSUS: EDNE3-x(2)-EGA3-F-IELIVMFY3-x-ENT3-R-x(3)-EPA3- 
ELIVMFY3(2)-x(S)- 

CONSENSUS: ELIVMFYCT3-ELIVMFY3-X ( 2 ) -EGV3 - 

NAME: Malate dehydrogenase active site signature- 
15 CONSENSUS: ELIVM3-T-ETRKMN3-L-D-x(2)-R-ESTA3-x(3)-IELIVMFY3. 

NAME: Malic enzymes signature- 

CONSENSUS: F-X-EDV3-D-X (2>-G-T-EGSA3-x-EIV3-x-ELIVMA3- 

CGAST3(2)-ELIVMF3C2> - 

20 

NAME: Isocitrate and isopropylmalate dehydrogenases 
signature- 

CONSENSUS: ENS3-ELIMYT3-EFYDN3-G-EDNT3-EIMVY3-X-ESTGDN3-EDN3- 
x(E)-CSGAP3-x(3-.M)-G- 
25 CONSENSUS: ESTG3-ELIVMPA J-G-ELIVMF3 - 

NAME: b-phosphogluconate dehydrogenase signature- 
CONSENSUS: ELIVM3-x-I)-x(a)-EGA3-EN(JS3-K-G-T-G-x-.lil. 

30 NAME: Glucose-b-phosphate dehydrogenase active site- 
CONSENSUS: D-H-Y-L-G-K-EEJ2K3 - 

NAME: IMP dehydrogenase / GMP reductase signature- 
CONSENSUS: ELIVM3-I RK3-ELIVM3-G-ELIVM3-G-X-G-S-ELIVM3-C-X-T - 



35 



NAME: Bacterial quinoprotein dehydrogenases signature 1- 

CONSENSUS: EDEN3-U-x(3)-G-ERK3-x(b)-EFYU3-S-x(4)-ELIVM3-N- 
x(2)-N-V-x(2)-L-ERK3- 



40 NAME: Bacterial quinoprotein dehydrogenases signature 2- 
CONSENSUS: U-x(M)-Y-D-xC3)-EDN3-ELIVMFY3(M)-x(2)-G-x (2)- 

ESTA3-P- 

NAME: FMN-dependent alpha-hydroxy acid dehydrogenases active 
45 site- 

CONSENSUS: S-N-H-G-EAG3-R-C - 

NAME: GMC oxidoreductases signature 1- 

CONSENSUS: EGA3-ERKN3-x-ELIV3-G(2)-EGST3 C2)-x-ELIVM3-N-xC3)- 

50 EFYUA3-x(2)-EPAG3-x(S)- 
CONSENSUS: CDNESH3 - 

NAME: GMC oxidoreductases signature 2- 

CONSENSUS: EGS3-EPSTA3-x(2)-EST3-P-x-ELIVM3(2)-x(2)-S-G- 
55 ELIVM3-G- 

NAME: Eukaryotic molybdopterin oxidoreductases signature. 
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CONSENSUS: IGAl-x ( 3 ) -EKRN<2HT3-x ( 11 ,1M ) -IELIVf1FYWS3-x (fl) - 

CLIVflF J-x-C-x ( 5) -CDEN3-R- 
CONSENSUS: x(2)-OE3- 

5 NAME: Prokaryotic molybdopterin oxidoreductases signature 1- 

CONSENSUS: D:STAN3-x-n:CH3-x(2-,3)-C-n:STAG3-G:GSTVI1F3-x-C-x- 
ELIVnTYlO-x-ELIVflA J-x (3 ,4) - 
CONSENSUS: EDENi2KHT3. 

10 NAME: Prokaryotic molybdopterin oxidoreductases signature 2- 

CONSENSUS: ESTA3-X-ESTAC3 (2)-x (2 )-ESTA3-])-l[LIV'h'Y3 (2) -L-P-x- 

ESTAC3(2)-xC2>-E- 

NAME: Prokaryotic molybdopterin oxidoreductases signature 3- 

15 CONSENSUS: A-x (3) -EGDTI-I-x-ONdTO-x-EDEAJ-x-IELIVrO-x- 

ELIVMC3-x-ENS3-x(2)-EGS3- 
C0NSENSUS: x(S)-A-x-ELIVri3-EST3. 

NAME: Aldehyde dehydrogenases glutamic acid active site- 

20 CONSENSUS: ELIVnTGA3-E-ELII1STAC3-EGS3-G-EKNLn3-ESADN3- 
ETAPFV3- 

NAME: Aldehyde dehydrogenases cysteine active site* 

CONSENSUS: EFYLVA3-x<3)-G-E<2E3-x-C-ELIVI1GSTANC3-EAGCN3-x- 
25 EGSTADNEKR3 • 

NAME : Aspartate-semialdehyde dehydrogenase signature- 

CONSENSUS: ELIVn3-ESADN3-x(2)-C-x-R-ELIVI13-x(4)-EGSC3-H- 
ESTA3 • 



30 



45 



NAME: Glyceraldehyde 3-phosphate dehydrogenase active site. 

CONSENSUS : EASV3-S-C-ENT3-T-X (2 ) -CLIPI3 - 



NAME: N-acetyl-gamma-glutamyl-phosphate reductase active 
35 site. 

CONSENSUS: ELIVri3-EGSA3-x-P-G-C-EFY3-EAVP3-T-EGA3^x<3>- 
EGTAC3-ELIVM3-X-P • 

NAME: Gamma-glutamyl phosphate reductase signature- 
40 CONSENSUS: V-x(5)-A-ELIV3-x-H-I-x(2)-EHY3-EGS3-EST3-x-H-EST3- 
EDE3-X-I- 



NAME: Dihydrodipicolinate reductase signature- 

CONSENSUS: E-EIV3-x-E-x-H-x(3)-K-x-D-x-P-S-G-T-A. 

NAME: Dihydroorotate dehydrogenase signature !• 

CONSENSUS: EGS3-X (M)-EGK3-ESTA3-EIVSTA3-EGT3-x(3)-ENl2R3-x-G- 

ENH3-x(2)-P-ERT3- 

50 NAME: Dihydroorotate dehydrogenase signature 2- 

CONSENSUS : ELIV3(2)-EGSA3-x-G-G-EIV3-x-ESTGN3-x(3>-EACV3- 
x(b)-G-A- 

NAME: Coproporphyrinogen III oxidase signature- 

55 CONSENSUS: K-x-lil-C-x (2 ) -EFYH3 < 3) -ELIVM3-x-H-R-x-E-x-R-G- 

ELIVM3-G-G-ELIVri3-F-F-I> - 



-413- 



WO 01/98454 PCT/IB01/02050 

NAME: Fumarate reductase / succinate dehydrogenase FAD- 

binding site- 

CONSENSUS: R-EST3-H-CST3 -x (2)-A-x -G-G ■ 

5 NAME: Acyl-CoA dehydrogenases signature 1. 

CONSENSUS: EGAC3-ELIVni-IEST:D-E-x(2)-EGSAN:D-G-ESTJ-D-x<E>- 
EGSAJ. 

NAME: Acyl-CoA dehydrogenases signature S- 

10 CONSENSUS: E<2DE3-x (2)-G-EGS3-x-G-ELIVMFY]|-x (2)-EDEN1-x (1 )- 

CKR]|-x(3)-C]>EN3. 

NAME : Alanine dehydrogenase & pyridine nucleotide 

transhydrogenase signature 1- 
15 CONSENSUS: G-ELIVM3-P-x-E-x(3)-N-E-x(1-.3)-R-V-A-x-CST1-P-x- 
CGST3-V-x(2)-L-x-IEKRH3- 
CONSENSUS: x-G- 

NAME: Alanine dehydrogenase & pyridine nucleotide 

20 transhydrogenase signature 2- 

CONSENSUS: CLIVM3(B)-G-CGA3-G-x-A-G-x(2)-0:SA]l-x(3)-CGA3-x- 

tSGJ-ELIVMJ-G-A-x-V- 

CONSENSUS: . x<3)-D. 

25 NAME: Glu / Leu / Phe / Val dehydrogenases active site. 

CONSENSUS: ELIV3-x(2)-G-G-ESAG]l-K-x-CGVJ-x(3)-ONST]!-EPLJ. 



30 



NAME: D-amino acid oxidases signature. 

CONSENSUS: CLIVM3(2)-H-ICNHA3-Y-G-x-IEGSA3(2)-x-G-x(S)-G-x-A. 

NAME: Pyridoxamine 5'-phosphate oxidase signature- 
CONSENSUS: CLIVFl-E-F-bl-IElJHGll-x ( H ) -R-ELIVMl-H-ONEl-R - 



NAME: Copper amine oxidase topaquinone signature. 

35 CONSENSUS: ELIVMJ-ELIVMAJ-ELIVMJ-x(4)-T-x<2)-N-Y-II>E3-IEYN:D. 

NAME: Copper amine oxidase copper-binding site signature* 

CONSENSUS: T-x-G-x(2)-H-CLIVMF3-x(3)-E-IDE3-x-P. 

40 NAME: Lysyl oxidase putative copper-binding region 
signature. 

CONSENSUS: U-E-b)-H-S-C-H-<3-H-Y-H . 

NAME: Delta 1-pyrroline-S-carboxylate reductase signature. 

45 CONSENSUS: EPALFJ-x (2,3 ) -CLIV3-X (3 ) -ILIVMI-ICSTAClI-ILSTVl-x- 
CGAN3-G-x-T-x(2)-CAGl- 

CONSENSUS: CLIVJ-x (2)-ELMF3-Il>EN(3K3 . 

NAME: Dihydrof olate reductase signature. 

50 CONSENSUS: ELVAGC3-CLIF3-G-xm-ELIVMF3-P-W-x<M-.5>-OEJ-x(3)- 
IFYIVl-x(3)-ICSTI(33. 

NAME: Tetrahydrof olate dehydrogenase/cyclohydrolase 

signature 1* 

55 CONSENSUS: EEfliD-x-tEflO-ILIVMJ (2) -x (2 ) -ELIVMJ-x (2) -ELIVMYJ-N- 

x-ONJ-xm-ELIVMFJO)- 
CONSENSUS: fl-L-P-ELVL 
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NAME : Tetrahydrof olate dehydrogenase/cyclohydrolase 

signature E- 

CONSENSUS: P-G-G-V-G-P-IEMF3-T-fl:iV3 . 

5 NAME: Oxygen oxidoreductases covalent FAD-binding site. 

CONSENSUS: P-x(lD)-OE3-ELIVM3-x(3)-ELIVM3-x(T>-0:LIVM3-x<3)- 
EGSA3-IEGST3-G-H. 

NAME: Pyridine nucleotide-disulphide oxidoreductases class-I 

10 active site- 

CONSENSUS: G-G-x-C-IELIVA3-x(E)-G-C-li:LIVM3-P. 

NAME: Pyridine nucleotide-disulphide oxidoreductases class- 

II active site- 

15 CONSENSUS: C-x(2)-C-D-EGA3-x(5-,M)-EFY31-x(M)-I[LIVn3-x- 
[CLiVM3(E)-G(3)-EDN3. 

NAME : Respiratory-chain NADH dehydrogenase subunit 1 

signature 1- 

20 CONSENSUS: G-ELIVMFYKRS3-ELIVMAGP3-<2-x-ELIVMFY3-x-D-li:AGIM3- 
CLIVMFTA3-K-CLVMYST3- 

CONSENSUS: [[LIVMFYG3-x-IEKR3-!EE<3G3 - 

NAME: Respiratory-chain NADH dehydrogenase subunit 1 

25 signature E- 

CONSENSUS : P-F-D-ELIVMFYt23-ESTAGPVM3-E-EGAC3-E-x-EEl2]l- 
ILIVnS3-x(E)-G- 

NAME: Respiratory-chain NADH dehydrogenase ED Kd subunit 

30 signature- 

CONSENSUS: CGN3-x-D-IKRST3-ELIVMF3(E)-P-CIV3-D-CLIVMFYU]l(E)- 
x-P-x-C-P-EPT3 • 

NAME: Respiratory-chain NADH dehydrogenase E4 Kd subunit 

35 signature. 

CONSENSUS: D-x(2)-F-EST3-x(S)-C-L-G-x-C-x(E)-inGA3-P. 

NAME: Respiratory chain NADH dehydrogenase 30 Kd subunit 
signature. 

40 CONSENSUS: E-R-E-x(£)-OE3-IELIVMF3(E)-xCb)-EHK3-x(3)-li:KRP3-x- 
CLIVM3-!CLIVMS3. 

NAME: Respiratory chain NADH dehydrogenase NT Kd subunit 
signature. 

45 CONSENSUS: CLIVMH3-H-ERT3-CGA3-x-E-K-0:LIVMT3-x-E-x-CKR(23 . 

NAME: Respiratory-chain NADH dehydrogenase SI Kd subunit 

signature 1. 

CONSENSUS: G-ICAM3-G-EAR3-Y-ELIVM3-C-G-EDE3(E)-ESTA3(E)- 
50 CLIM3(E)-I[EN3-S. 

NAME: Respiratory-chain NADH dehydrogenase SI Kd subunit 

signature 2- 

CONSENSUS: E-S-C-G-x-C-x-P-C-R-x-G . 

55 

NAME: Respiratory-chain NADH dehydrogenase 75 Kd subunit 

signature 1- 

CONSENSUS: P-x(2)-C-EYbJS3-x(7>-G-x-C-R-x-C- 
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NAME: Respiratory-chain NADH dehydrogenase 75 Kd subunit 

signature 2. 

CONSENSUS: C-P-x-C-EDE3-x-[EGS]](2)-x-C-x-L-<3. 

5 

NAME: Respiratory-chain NADH dehydrogenase 75 Kd subunit 

signature 3- 

CONSENSUS: R-C-IELIVMiD-x-C-x-R-C-CLIVMJ-x-IEFYJ . 

10 NAME: Nitrite and sulfite reductases iron-sulf ur/siroheme- 

binding site. 

CONSENSUS: CSTV3l-G-C-x(3)-C-x(t)-C]>E]i-ELIVI1F3-fl:GAT])-ELIVnF]|. 

NAME: Uricase signature. 

15 CONSENSUS: L-x-ELV3-L-K-EST]l-T-x-S-x-F-x(2)-CFY3-x(M)-EFY3. 

NAME-' Heme-copper oxidase catalytic subunit-. copper B 

binding region signature- 

CONSENSUS: CYUG3-ELIVFYUTAK2)-EVGS3-H-ELNP3-x-V-x(M4-.M7)-H- 
20 H. 

NAME: CO II and nitrous oxide reductase dinuclear copper 

centers signature- 

CONSENSUS: V-x-H-x (33i MD )-C-x (3)-C-x (3) -H-x (2) -M - 

25 

NAME: Cytochrome c oxidase subunit Vbi zinc binding region 

signature- 

CONSENSUS: ELIVMJ (2) -EFYUJ-x ( ID) -C-x (2)-C-G-x (2) -IFY3-K-L - 

30 NAME: Multicopper oxidases signature 1- 

CONSENSUS: G-x-CFYU3-x-B:LIVMFYIjJJ-x-l[CST]l-x(fl)-G-ELMJ-x(3)- 
CLIVMFYU3- 

NAME: Multicopper oxidases signature 2- 
35 CONSENSUS: H-C-H-x(3)-H-x(3)-CA63-CLM]I. 

NAME: Peroxidases proximal heme-ligand signature- 

CONSENSUS: OETI-ICLIVMTAJ-x (2 ) -ELIVMl-ELIVMSTAGH-CSAGJ- 

ILIVMSTAGH-H-ESTAl-ELIVMFY]!. 



40 



NAME: Peroxidases active site signature- 

CONSENSUS: ICSGATV3-x(3)-ELIVMA]|-R-IELIVMA3-x-CFIJ3-H-x-CSAC3. 



NAME: Catalase proximal heme-ligand signature- 
45 CONSENSUS: R-CLIVMFSTAN3-F-IEGASTNP]l-Y-x-D-IAST]|-C(3EH]l - 

NAME: Catalase proximal active site signature- 

CONSENSUS: EIFJ-x-ERH]l-x(iO-l[E<2]]-R-x<2)-H-x(2>-EGAS:D-EGASTF]]- 

EGASTJ- 

50 

NAME: Glutathione peroxidases selenocysteine active site- 
CONSENSUS: EGNll-ERKHNFYO-x-ELIVMFCJ-ELIVMFJ^-x-N-EVTJ-x- 
ESTO-x-C-IGAJ-x-T • 

55 NAME: Glutathione peroxidases signature 2- 
C0NSENSUS: ELIVJ-EAGDH-F-P-ECSI-ENGJ-fi-F - 

NAME: Lipoxygenases iron-binding region signature 1- 
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CONSENSUS: H-EEflJ-x ( 3 ) -H-x-ELMl-ENflRO-EGSTl-H-ELIVMSTACl ( 3 ) ■ 

E • 

NAME: Lipoxygenases iron-binding region signature 2- 
CONSENSUS: ELIVMAl-H-P-ELIVMI-x-EKRlO-ELIVMFI^-x-EAPiD-H. 

NAME: Extradiol ring-cleavage dioxygenases signature- 

CONSENSUS: EGNTIVJ-x-H-x<5i7)-ELIVMF:)-Y-x(2)-EDENTA3-P-x- 
EGP3-x(2i3)-E. 

NAME: Intradiol ring-cleavage dioxygenases signature- 

CONSENSUS: CLIVM3-x-G-x-ICLIVM3-x (M ) - EGSJ-x (B)-ELIVM3-x (1 ) - 

ELIVMl-EDEJ-ELIVMFYJ- 
C0NSENSUS: x(b)-G-x-EFY]|. 

NAME: Indoleamine 2i3-dioxygenase signature 1. 

CONSENSUS: G-G-S-EANH-EGAJ-dJ-S-S-x (2 ) -Q • 



NAME: Indoleamine 2i3-dioxygenase signature 2- 

20 CONSENSUS: EFY3-L-ED(33-EDE3-ILIVM3-x(2)-Y-M-x(3)-H-EKR3. 

NAME: Bacterial ring hydroxylating dioxygenases alpha- 

subunit signature- 

CONSENSUS : C-x-H-R-EGAl-x(fl)-G-N-x(5>-C-x-EFY3-H- 

25 

NAME: Bacterial luciferase subunits signature- 

CONSENSUS: EGA3-ELIVM3-P-ELIVM3-x-ELIVMFY3-x-U-x(b)-ERK3- 
x(b)-Y-x(3)-EAR3- 

30 NAME : ubiH/COflb monooxygenase family signature- 

CONSENSUS: H-P-ELIV3-EAG3-G-i3-G-x-N-x-G-x(2)-I>. 

NAME : Biopterin-dependent aromatic amino acid hydroxylases 

signature- 

35 CONSENSUS: P-D-x (2)-H-E»E3-ELI3-ELIVMF3-G-H-ELIVMC3-P- 

NAME: Copper type Hi ascorbate-dependent monooxygenases 

signature 1- 

CONSENSUS : H-H-M-x (2)-F-x-C- 

40 

NAME: Copper type Hi ascorbate-dependent monooxygenases 

signature 2. 

CONSENSUS: H-x-F-x(M)-H-T-H-x(2)-G- 

45 NAME: Tyrosinase CuA-binding region signature- 

CONSENSUS: H-x^i5>-F-ELIVMFTPJ-x-EFliO-H-R-xC2>-ILM:]l-xC3)-E. 

NAME: Tyrosinase and hemocyanins CuB-binding region 

signature- 

50 CONSENSUS: »-P-x-F-ELIVMFYU3-x(2)-H-x(3)-l>. 

NAME: Fatty acid desaturases family 1 signature- 

CONSENSUS: G-E-x-EFY3-H-N-EFY3-H-H-x-F-P-x-»-Y - 

55 NAME: Fatty acid desaturases family 2 signature- 

CONSENSUS : EST3-ESA3-x(3)-El2R3-ELI3-x(Sib)-]>-Y-x(2)- 
ELIVMFYU3-ELIVM3-EDE3- 
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NAflE: Cytochrome PM5D cysteine heme-iron ligand signature- 

CONSENSUS:. * EFUl-ESGNHll-x-EGDll-x-ERHPTni-x-C-ELIVriFAPll-EGAl):] - 

NAME: Heme oxygenase signature- 

5 CONSENSUS: L-L-V-A-H-A-Y-T-R . 

NAME: Copper/Zinc superoxide dismutase signature !• 

CONSENSUS : EGAI-EIFATH-H-ELIVFJ-H-x (2 ) -EGPH-ESDGD-x-ESTAGDU . 

10 NAME: Copper/Zinc superoxide dismutase signature 2- 

CONSENSUS : G-EGN3-ESGA3-G-X-R-X-ESGA3-C-X (2 ) -EIV3 . 



15 



20 



25 



30 



35 



45 



NAME: Manganese and iron superoxide dismutases signature. 

CONSENSUS: D-x-bl-E-H-ESTA3-EFY:DC2) . 

NAME: Ribonucleotide reductase large subunit signature. 

CONSENSUS: U-x ( 2 ) -ELFJ-x ( b-.7 ) -G-ELIVIU-EFYRAJ-ENHID-x ( 3 ) - 

CSTA(2LIVI13-D:ASC3-x(2)- 
CONSENSUS: EPA3 - 

NAME: Ribonucleotide reductase small subunit signature- 

CONSENSUS : EIVMSEfll-E-x (1-.2>-ELIVTA3-EHY:D-EGSA:D-x-ESTAVM:D-Y- 

x(2)-ILIVrH23-x(3)- 

CONSENSUS: ELIFY3-EIVFYCSA3 • 

NAME: Nitrogenases component 1 alpha and beta subunits 

signature 1. 

CONSENSUS : ELIVMFYHl-ELIVP1FSTl-H-EAG3-IAGSP3-ELIVI1N(aA3-EAG3- 
C 

NAME: Nitrogenases component 1 alpha and beta subunits 

signature 2- 

CONSENSUS: ESTAN<0-EET]|-C-x ( 5) -G-D-EDNl-ELIVMTH-x-ESTAGR])- 

ELIVHFYST3- 

NAME: NifH/frxC family signature 1- 

CONSENSUS: E-x-G-G-P-x C 2) -EGAJ-x-G-C-EAGH-G - 



NAME: NifH/frxC family signature 2- 

40 CONSENSUS: D-x-L-G-D-V-V-C-G-G-F-EAGI-x-P - 



NAME: Nickel-dependent hydrogenases large subunit signature 

1. 

CONSENSUS: R-G-ELIVMFJ-E-x (15)-EflESn3-R-x-C-G-ELIVni-C 

NAME : Nickel-dependent hydrogenases large subunit signature 

2- 

CONSENSUS: EFYl-D-P-C-ELIPO-EASGJ-C-x (2n3)-H . 

50 NAME: Glutamy 1-tRNA reductase signature- 

CONSENSUS: H-ELIVH3-x(2>-ELIVHJ-EGSTAO<3)-ELIVMJ-EDE<2:D-S- 
ELIVI1AJ-ELIVri3(2)-EGFJ-E- 

C0NSENSUS: x-EdRH-EIVH-ELITl-ESTAGH-fl-ELIVrO-EKRJ. 

55 NAI1E: Bacterial-type phytoene dehydrogenase signature- 

CONSENSUS : ENGJ-x-EFYIrtVl-ELIVMFJ-x-G-EAGCI-EGSID-ETAJ-EHiaTlD-P- 

G-ESTAV3-G-ELIVri3- 

C0NSENSUS: x(5)-EGS3- 
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30 



40 



NAME : Glycine radical signature- 

CONSENSUS: ESTIV3-x-R-EIVT3-ECSA3-G-Y-x-EGACV3 - 

NAHE: Ergosterol biosynthesis ERG L t/ERG2M family signature 1 
CONSENSUS: G-x ( 2 ) -ILIVfO-Y-D-x-lEFYl-x-G-x (2 ) -L-N-P-R - 

NAME: Ergosterol biosynthesis ERG l »/ERG2 l 4 family signature 2 
CONSENSUS: ELIVM3 (2 >-H-R-x (2 >-R-D-x (3) -C-x (2)-K-Y-G- 

NAME: NNMT/PNMT/TEMT family of methyltransf erases signature 
CONSENSUS: L-I-D-I-G-S-G-P-T-[EIV3-Y-<3-L-L-S-A-C - 



NAME: RNA methyltransf erase trmA family signature !• 
15 CONSENSUS: EDN3-P-EPA3-R-x-G-x<m-.lb)-inLIVM3(2>-Y-x-S-C-N- 
x(2)-T- 

NAME: RNA methyltransf erase trmA family signature 2. 
CONSENSUS: ELIVMF3-D-x-F-P-IC(2HY3-EST3-x-H-ELIVMFY3-E.. 

20 

NAME: Thymidylate synthase active site. 

CONSENSUS: R-x(2)-ELIVM3-x<3>-EFU3-E<aN3-x(fl-,T)-ELV3-x-P-C- 

EHAVM3-x(3)-EiaMT3-EFYIil3- 

CONSENSUS: X-ELV3- 

NAME: Ribosomal RNA adenine dimethylases signature. 
CONSENSUS : ELIVM3-ELIVMFY3-EDE3-x-G-CSTAPV3-G-x-l[GA3-x- 
ELIVMF3-EST3-x(2)-ELIVM3- 

C0NSENSUS: x Cb >-ELIVMY3-x-ESTAGV3-ELIVMFYHC3-E-x-D - 

NAME: Methylated-DNA — protein-cysteine methyltransf erase 

active site- 

CONSENSUS: ELIVMF3-P-C-H-R-ELIVMF3 (2) - 

35 NAME: N-b Adenine-specif ic DNA methylases signature- 

CONSENSUS: ELIVMAC3-ELIVFYUA3-x-li:DN3-P-P-EFYIJ3 - 



NAME: N-M cytosine-specif ic DNA methylases signature- 
CONSENSUS: ELIVMF3-T-S-P-P-EFY3 • 

NAME: C-5 cytosine-specif ic DNA methylases active site- 
CONSENSUS: CDENKS3-X-EFLIV3-X (2) -EGSTC3-x-P-C-x (2)-EFYWLIM3- 

S- 



45 NAME: C-S cytosine-specif ic DNA methylases C-terminal 

signature- 

CONSENSUS: ERKt2GTF3-x(2)-G-N-CSTAGl-ELIVMF3-x(3)-ELIVMT3- 
x(3)-ELIVMJ-x(3)-ELIVM3- 

50 NAME : Protein-L-isoaspartate (D-aspartate) 0- 

methyltransf erase signature- 

CONSENSUS: EGSA3-D-G-x(2)-G-EFYlilV3-x(3>-EAS3-P-EFY3-EDN3-x-I. 

NAME: Uroporphyrin-III C-methyltransf erase signature 1- 

55 CONSENSUS: ELIVMl-EGSl-ESTAL3-G-P-G-x(3)-ELIVMFY]i-ELIVM]l-T- 
ELIVM3-B!KRH<2G3-EAG3- 

NAME: Uroporphyrin-III C-methyltransf erase signature 2- 
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CONSENSUS: V-x ( 2 > -ELIl-x C 2 ) -G-D-x C 3 ) -EFYUJ-EGSl-x ( fl ) -ELIVFJ- 

xCS,b)-Q:LIVI1FYUPAC3- 

CONSENSUS: x-CLIVilYI-x-P-G - 

NAME: ubiE/CO<35 methyltransf erase family signature 1. 

CONSENSUS: Y-D-x-fl-N-x (2) -ELIVM-S-x (3)-H-x (2) -Id- 

NAME: ubiE/COflS methyltransf erase family signature 2- 

CONSENSUS: R-V-ELIVMJ-K-EPV3-G-G-x-ELIVnF3-x(2)-ILIVIi:D-E-x-S. 



NAME: Serine hydroxymethyltransf erase pyridoxal-phosphate 

attachment site. 

CONSENSUS: EDEH3-ELIVHFY3-X-IESTI1V3-CGST3-CST3 (2) -H-K-CST3- 

CLFa-x-G-IPACl-ERfll- 
15 CONSENSUS: EGSA3-EGA3 • 

NAME: Phosphoribosylglycinamide f ormyltransf erase active 

site. 

CONSENSUS: G-x-ESTM3-EIVT3-x-EFYUVfl3-EVI1AT3-x-EDEVt13-x- 
20 ILIVnY3-l>-x-G-x(2)-ELIVT3- 
CONSENSUS: x(b)-ELIVH3. 

NAME: Aspartate and ornithine carbamoyltransf erases 

signature. 

25 CONSENSUS: F-x-EEO-x-S-EGTll-R-T • 

NAME: Transketolase signature 1- 

CONSENSUS: R-x(3)-ELIVHTA3-i:]>ENt2STHKF3-x(S-,b)-EGSN3-G-H- 
EPLIVMF3-IGSTA3-x(2)- 
30 CONSENSUS: ELIHC3-EGS3 - 

NAUE: Transketolase signature 2- 

CONSENSUS: G-OEflGSAI-ONl-G-EPAEO-ESTJ-IEHlO-x-IEPAGII]!- 
ELIVMYAO-EDEFYtO-x^)- 
35 CONSENSUS: ESTAPl-x (2) -ERGA3 - 

NAME: Transaldolase signature 1- 

CONSENSUS: EDG3-EIVSA3-T-EST3-N-P-ESTA3-ELIVMF3(2) . 

40 NAUE: Transaldolase active site. 

CONSENSUS: ELIVH3-x-CLIVH3-K-ELIVI13-IEPAS3-x-ESTl-x-EPEN(JPAS3- 

G-ELIVMJ-x-EAGVJ-x- 

C0NSENSUS: E(2EKRST3-x-ELIVn3 . 

45 NAUE: Acyltransf erases ChoActase / COT / CPT family 

signature 1- 

CONSENSUS: ELI3-P-x-ELVPl-P-EIVTA3-P-x-ELIVI13-x-EI>EN(3ASl- 
ESO-ILIVrO-x (2 ) -ELY! • 

50 NANE: Acyltransf erases ChoActase / COT / CPT family 

signature 2- 

CONSENSUS: R-EFYU3-X-EDA3-EKA3-X (D-,1 ) -ELIVMFY3-X-ELIVI1FY3 C2>- 

x(3)-E»NS3-EGSA3-x CD- 
CONSENSUS: EDE3-EHS3-x(3)-EDE3-EGA]|. 



NAME: Thiolases acyl-enzyme intermediate signature. 

CONSENSUS: ELIVf13-ENST3-x(2)-C-ESAGLI3-EST3-ISAG3-ELIVnFYNS3- 
x-ESTAGI-ELIVfD-xCb)- 
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CONSENSUS: ELIVMJ- 

NAME: Thiolases signature E - 

CONSENSUS : N-xCS^G-G-x-ILIVIIl-IESAl-x-G-H-P-x-G-x-ESTI-G. 

5 

NAI1E: Thiolases active site- 

CONSENSUS: EAGI-ELIVnAl-IESTAGLIVni-ESTAGl-ELIVMAI-C-x-EAGll-x- 
CAGJ-x-ICAGJ-x-IESAG]] - 

10 NAME: Chloramphenicol acetyltransf erase active site- 

CONSENSUS : fl-ELIVJ-H-H-ESAJ-x C£) -D-G-ICFY3-H . 

NAME: Hexapeptide-repeat containing-transf erases signature- 

CONSENSUS: ELIV3-CGAEl>3-x(£)-ESTAV3-x-ELlVJ-xC3>-li:LIVAC3-x- 
15 ILIVl-ICGAEDl-xCE)- 

CONSENSUS: CSTAVR3-x-CLIV3-CGAED3-x(E)-0:STAV3-x-l[LIV3-x(3)- 
CLIVJ. 

NAME: Beta-ketoacyl synthases active site- 

20 CONSENSUS: G-x ( H )-CLIVnFAP3-x(E) -CAGO-C-CSTAl (E >-CSTAGJ- 

x(3)-ICLIVMF:D. 

NAME: Chalcone and stilbene synthases active site- 

CONSENSUS: R-ELIVMFYSl-x-ELIVMJ-x-ICflHGJ-x-G-C-inFYNAll-IEGADI-G- 
25 CGAI-ICSTAVl-x-irLIVPIFJ- 
CONSENSUS: ERA3- 

NAME: Myristoyl-CoA:protein N-myristoyltransf erase signature 

1- 

30 CONSENSUS: E-I-N-F-L-C-x-H-K- 

NAME: Myristoyl-CoA:protein N-myristoyltransf erase signature 
E- 

CONSENSUS: K-F-G-x-G-D-G- 

35 

NAME: Gamma-glutamyltranspeptidase signature- 

CONSENSUS: T-CSTA3-H-x-CST3-ELIVriA3-x(4)-G-CSN]l-x-V-CSTA3-x- 

T-X-T-ICLIVI13-ENE3- 

CONSENSUS: x(l-.E)-CFY3-G. 

40 

NAME: Transglutaminases active site- 

CONSENSUS: CGT3-(3-ICCA3-lil-V-x-IESA3-IEGA3-CIVT3-x (E )-T-x-CLt1SC3- 
R-CCSA3-CLV3-G - 

45 NAME: Phosphorylase pyridoxal-phosphate attachment site- 

CONSENSUS: E-A-ESC3-G-X-EGS3-X-H-K-X (E ) -ELfO-N - 

NAME: UDP-glycosyltransf erases signature- 

CONSENSUS: EFU3-x(£)-<2-x(E)-ELIVnYA3-ELinV3-x<^b)-ELVGAC]l- 
50 ILVFYA3-CLIVMF3-CSTAGCI13- 

CONSENSUS: CHN(23-n:STAGCll-G-x(E)-ESTAG3-x(3)-fl:STAGL3-ICLIVMFA3- 
x(M)-CP(JR3-D:LIVMT3- 

CONSENSUS: x(3)-CPA3-x(3)-ILDES3-E(aEHN3. 

55 NAME: Purine/pyrimidine phbsphoribosyl transferases 

signature- 

CONSENSUS : ELIVnFYUCTAJ-ELIVni-ELIVMAJ-ICLIVMFCJ-OEl-D- 
CLIVnSJ-CLIVM3-CSTAVD3- 
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CONSENSUS: lESTARl-CGACJ-x-ILSTARS - 

NAME: Glutamine amidotransf erases class-I active site. 

CONSENSUS: CPAS3-ELIVMFYT3-l[LIVMFY3-G-[i:LIVnFY3-C-|]:LIVMFYN3-G- 
5 x-CflEHJ-x-CLIVIIFAl. 

NAME: Glutamine amidotransf erases class-II active site- 

CONSENSUS : <x (Dill) -C-EGSJ-OVJ-CLIVNFYUJ-EAGJ . 

10 NAME: Purine and other phosphorylases family 1 signature- 

CONSENSUS: CGSTI-x-G-CLIVIU-G-x-EPAll-S-x-ILGSTAl-I-x (3)-E-L - 

NAME: Purine and other phosphorylases family 5 signature- 

CONSENSUS: CLIV3-X (3)-G-x (H)-H-x-ELIVMFYl-x (4 ) -ELIVMFJ-x (3 ) - 

15 EATV3-x(l-.S)-ELIVH]l-x- 

CONSENSUS: ICATV3-x(4>-£GN3-x(3- I H>-0:LIVMF3<2>-x(2>-n:STN3-ESA3- 
X-G-EGS3-CLIVH3 - 

NAI1E: Thymidine and pyrimidine-nucleoside phosphorylases 

20 signature. 

CONSENSUS: S-CGS3-R-B:GA3-ELIV3-x(E)-CTA3-B:GA3-G-T-x-D-x- 
ELIV3-E- 

NAME: ATP phosphoribosyltransf erase signature- 

25 CONSENSUS: E-x(5)-G-x-ISAG3-x(E)-EIV3-x-D-IELIV3-x(a>-IEST3-G- 
X-T-ELM3- 



30 



50 



55 



NAUE: NAD:arginine ADP-ribosyl transferases signature. 
CONSENSUS : CFY3-X-CFY3-K-X ( 2 ) -H-CFY3-X-L-CST3-X-A • 

NAME : Prolipoprotein diacylglyceryl transferase signature. 
CONSENSUS: G-R-X-EGA3-N-F-IELIVMF3-N-X-E-X <2)-G- 



NAME: S-adenosylmethionine synthetase signature 1- 
35 CONSENSUS: G-A-G-D-<2-G-x (3) -G-Y - 

NAME: S-adenosylmethionine synthetase signature 2- 
CONSENSUS: G-CGA3-G-CASC3-F-S-x-K-0:DE3 - 

40 NAME: Polyprenyl synthetases signature 1- 

C0NSENSUS: CLIVM3(2)-x-D-D-x(2,M)-]>-x(4)-R-R-CGH]|. 

NAME: Polyprenyl synthetases signature 2- 

C0NSENSUS: CLIVMFY3-G-x(2)-ICFYL]l-fl-D:LIVM3-x-D-D-li:LIVMFY3-x- 
45 ONG3- 

NAME: Squalene and phytoene synthases signature 1- 
CONSENSUS: Y-ECSAM3-x(2)-EVSG3-A-CGSA3-ELIVAT3-Q:iV3-G-x<2)- 
ELMSC3-x(2)-ELIV3. 



NAME : Squalene and phytoene synthases signature 2- 

CONSENSUS: !LLIVM3-G-x(3)-(2-x(2-,3)-N-EIF3-x-R-l>-ELIVMFY3-x(2)- 
EDE3-X ( 14 -.7 ) -R-x-|[FY3- 
C0NSENSUS: x-P- 

NAME: Protein prenyltransf erases alpha subunit repeat 

signature- 
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CONSENSUS: EPSIAVl-x-ENDFVl-ENEfllYID-x-ELIVMAGPJ-U-ENflSTHFJ- 
EFYhKO-ELIVMRH- 

NAME: Riboflavin synthase alpha chain family signature. 

5 CONSENSUS: ELIVMFJ-x ( S) -G-ESTADNCO-EKREfllYIO-V-N-ELIVMl-E - 

NAME: Dihydropteroate synthase signature 1. 

CONSENSUS: ELIVM3-X-EAG3-ELIVMF1K2) -N-x-T-x-D-S-F-x-D-x-ESGJ • 

10 NAME: Dihydropteroate synthase signature 2- 

CONSENSUS: EGE:D-ESAJ-x-ELIVMJ(2)-1>-ELIVM:D-G-EGP:D-x(2)-ESTA:D- 
x-P- 

NAME: EPSP synthase signature 1. 
15 CONSENSUS: ELIVMll-x(2)-EGN:il-N-ESA3-G-T-ESTA]!-x-R-x-ELIVMYl-x- 
EGSTAJ. 

NAME: EPSP synthase signature 5- 

CONSENSUS: EKRJ-x-EKHJ-E-ECSTl-EDNEll-R-ELIVMl-x-ESTA]!- 
20 ELIVMCiJ-x^-EENJ-ELIVMFl-x- 

CONSENSUS: EKRA3-ELIVMFJ-G- 



25 



NAME: FLAP/GST2/LTCMS family signature- 

CONSENSUS: G-x(3)-F-E-R-V-EFY]]-x-A-EN<2:D-x-N-C. 



NAME: Aminotransferases class-I pyridoxal-phosphate 

attachment site. 

CONSENSUS : EGSJ-ELIVMFYTAO-EGSTAH-K-x (2 ) -EGSALVNJ-ELIVMFA3I- 

x-EGNARI-x-R-ELIVMAID- 
30 CONSENSUS: EGAD • 

NAME: Aminotransferases class-II pyridoxal-phosphate 

attachment site. 

CONSENSUS: T-ELIVMFYlO-ESTAGJ-K-ESAGll-ELIVMFYIilRJ-ESAGJ-x^)- 
35 ESAGJ- 

NAME: Aminotransferases class-Ill pyridoxal-phosphate 

attachment site- 

CONSENSUS: ELIVMFYUC3 (2 )-x-D-E-ELIVMA3-x (2 )-EGP3-x (D -il )- 

40 ELIVMFYUAG3-x(D-,l)-ESACR3-x- 

C0NSENSUS: EGSADI-x (12-.lt, )-»-ELIVMFYIilC3-x(2-,3)-EGSA3-K-x( 3)- 

EGSTADN3-EGSA3- 

NAME: Aminotransferases class-IV signature- 

45 CONSENSUS: E-x-ESTAGCIl-x(2)-N-ELIVMFACl-EFYl-x(b-,12)- 
ELIVMFJ-x-T-x (b-.fi) -ELIVMJ-x- 
C0NSENSUS: EGSl-ELIVMJ-x-EKRJ - 

NAME: Aminotransferases class-V pyridoxal-phosphate 

50 attachment site- 

CONSENSUS: ELIVFYCHT3-EDGH J-ELIVMFYAC3-ELIVMFYA3-X (2)- 

EGSTAO-EGSTAH-EHflRID-K- 

C0NSENSUS: x(4-,b)-G-x-EGSATJ-x-ELIVMFYSAC]|. 

55 NAME: Hexokinases signature- 

CONSENSUS: ELIVM]I-G-F-ETN3-F-S-EFY3-P-x(S)-ELIVM]|-EI>NST3- 

x(3)-ELIVM]l-x(2)-U-T-K-x- 

C0NSENSUS: ELF3- 
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NAME: Galactokinase signature- 

CONSENSUS: G-R-x-N-ELIVJ-I-G-E-H-x-D-Y • 

5 NAME: GHMP kinases putative ATP-binding domain. 

CONSENSUS: ELIVM3-EPO-x-EGSTA3-x(a-i:i.>-G-L-iGS3-S-S-EGSA3- 
EGSTAC3 • 

NAME: Phosphof ructokinase signature. 
10 CONSENSUS: IRK]l-x(4)-G-H-x-(3-J:(3Rni-G-G-x(S)-I>-R. 

NAME: pfkB family of carbohydrate kinases signature 1- 

CONSENSUS: EAG3-G-xC0-.:L)-EGAP3-x-N-x-ESTA3-x(b)-EGS3-x( c l)-G. 

15 NAME: pfkB family of carbohydrate kinases signature 2- 

CONSENSUS : EDNSK3-CPSTV3-x-ISAG3(B>-EG]>3-D-x(3)-li:SAGVJ-EAG3- 
ELIVMFY3-ELIVMSTAP3. 

NAME: ROK family signature. 

20 CONSENSUS: ELIVM3-x<2)-G-ELIVMFCT3-G-x-l[GA3-ELIVMFA3-x<fl>-G- 
x(3-.S)-EGATP3-x(2)- 
CONSENSUS: G-ERKH3- 

NAME: Phosphoribulokinase signature. 

25 CONSENSUS: K-CLIVH3-x-R-D-x(3)-R-G-x-ESTl-x-E. 

NAME: Thymidine kinase cellular-type signature- 

CONSENSUS: * EGA3-x(l-.2)-IE]>E3-x-Y-x-CSTAP3-x-C-ENKR3-x-l[CH3- 
ELIVHFYUH3 • 

30 

NAME: FGGY family of carbohydrate kinases signature 1. 

CONSENSUS: EMFYGS3-x-EPST3-x(B)-K-ELIVMFYU3-x-liI-ELIVnF3-x- 
EI>ENflTKR3-EEN(2H3. 

35 NAME: FGGY family of carbohydrate kinases signature 2. 

CONSENSUS: EGSA3-x-IELIVMFYIO-x-G-ELIVI13-x (7 nfl) -EHDENA3- 

ELIVMF3-x(2)-EAS3-IESTAIVM3- 
CONSENSUS: ELIVMFY3-EDE<33 • 

40 NAME: Protein kinases ATP-binding region signature- 

CONSENSUS: ELIV3-G-{P}-G-{P>-CFYlilMGSTNH3-ESGA3~CPIil}-|[LIVCAT3- 
{PD3-X-EGSTACLIVMFY3- 

CONSENSUS: x(Siia)-ELIVnFYUCSTAR3-EAIVP3-ELIVnFAGCKR3-K. 

45 NAME: Serine/Threonine protein kinases active-site 

signature- 

CONSENSUS: ELIVMFYC3-x-EHY3-x-D-inLIVMFY3-K-x (2 ) -N- 

ELIVMFYCT3C3) - 

50 NAME: Tyrosine protein kinases specific active-site 

signature ■ 

CONSENSUS: ELIVMFYC3-x-EHY3-x-I)-ELIV[1FY3-ERSTAC3-x(2)-N- 
ELIVI1FYC3(3) • 

55 NAME: Protein kinase domain profile- 

NAME : Casein kinase II regulatory subunit signature- 
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CONSENSUS : C-P-x-ELIVMY3-x-C-x ( S)-L-P-CLIVI1Cl-G-x( c l)-V-CKR]|- 

x(2)-C-P-x-C- 

NAME: Pyruvate kinase active site signature- 

5 CONSENSUS : ELIVAC3-x-ELIVM3C2>-ESAPCV3-K-ELIV3-E-IENKRST3-x- 
EDE(2H3-EGSTA3-ELIVM3 - 

NAME: Shikimate kinase signature- 

CONSENSUS: EKR3-X (2) -E-x <3>-ELIVMF3-x C6-.3.2) -CLIVMF3 (2) -CSA3- 

io x-g(3)-x-elivmf3. 

NAME: Prokaryotic diacylglycerol kinase signature- 

CONSENSUS: E-x-ELIVM3-N-EST3-ESA3-ELIV3-E-x(2)-V-D. 

15 NAME: Phosphatidylinositol 3- and M-kinases signature 1- 

CONSENSUS: ELIVMFAC3-K-xCL-.3)-EDEA3-EI>E3-ELIVMC3-R-l3-E]>E3- 
x(it>-l2. 

NAME: Phosphatidylinositol 3- and M-kinases signature 2- 

20 CONSENSUS: EGS3-x-EAV3-x(3> -ELIVM3-X C2) -CFYH3-ELIVM3 (2 ) -x- 

ELIVMF3-x-D-R-H-x(2)-N. 



25 



35 



40 



55 



NAME: Acetate and butyrate kinases family signature 1- 
CONSENSUS: ELIVM3 (2) -x-ELIVM3-N-x-G-S-EST3-S-x-EKE3 - 

NAME: Acetate and butyrate kinases family signature 2- 
CONSENSUS: ELIVMA3(2)-x(2)-H-x-G-x-G-x-EST3-ELIVM3-x-EAV3- 
x(3)-G- 



30 NAME: Phosphoglycerate kinase signature- 

CONSENSUS: EKRHGTCV3-EVT3-ELIVMF3-ELIVMC3-R-X-D-X-N-ESACV3-P . 



NAME: Aspartokinase signature- 

CONSENSUS: ELIVM3-X-K-EFY3-G-G-EST3-ESC3-ELIVM3 - 

NAME: Glutamate S-kinase signature- 

CONSENSUS: EGSTNJ-x(2)-G-x-G-EGC3-EIM3-x-ESTA3-K-ELIVM3-x- 

ESA3-ETCA3-x(2)-EGALV3- 

C0NSENSUS: x(3)-G- 

NAME: ATP:guanido phosphotransferases active site- 
CONSENSUS: C-P-x (Oil ) -EST3-N-EIL3-G-T • 



NAME: PTS HPR component histidine phosphorylation site 
45 signature. 

CONSENSUS: G-ELIVM3-H-ESTA3-R-EPA3-EGSTA3-ESTAM3 • 

NAME: PTS HPR component serine phosphorylation site 
signature. 

50 CONSENSUS: EGSADE3-EKRE(JTV3-x(H)-EKRN3-S-ELIVMFJ(2)-x-ELIVM3- 
x(2>-ELIVM3-EGAl>3. 



NAME: PTS EIIA domains phosphorylation site signature 1- 

CONSENSUS: G-x (2 ) -ELIVMF3 (3) -H-ELIVMF3-G-ELIVMF3-X-T-EALV3 . 

NAME: PTS EIIA domains phosphorylation site signature 2. 

CONSENSUS : IDENiO-x (b) -ELIVMF3-EGA3-X (2) -ILIVM3-A-ELIVM3-P-H- 

EGAC3 - 
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NAHE: PTS EIIB domains cysteine phosphorylation site 

signature. 

CONSENSUS: N-CLIVMFY]|-x(S)-C-x-T-R-CLIVriF]|-x-CLIVriFl-x- 
5 CLIVfU-x-CDflJ. 

NAflE : Adenylate kinase signature- 

CONSENSUS: CLIVI1FYU3(3)-]>-G-ICFYI3-P-R-x(3)-£N(a]l. 

10 NAME: Nucleoside diphosphate kinases active site. 

CONSENSUS : N-x(2)-H-EGA3-S-D-ESAJ-ELIVHPKNEJ. 

NAME: Guanylate kinase signature- 

CONSENSUS: T-ICST]l-R-x(E)-l[KR]l-x(E)-0:]>E3-x(E)-G-x(2)-Y-x-CFY3- 
15 CLIVHK3. 

NAME: Guanylate kinase domain profile- 

NAI1E: Phosphoribosyl pyrophosphate synthetase signature* 
20 CONSENSUS: B-CLI3-H-CSA3-x-fl-CIf1STJ-ICflni-fi-CFYJ-F-x(2)-P- 
CLIVHFC3-». 

NAME: ?nfi-dihydro-b-hydroxymethylpterin-pyrophosphokinase 
signature • 

25 CONSENSUS: G-EPEI-R-x (2) -D-L-D-ELIVM3 ( 2 ) • 

NAME: Bacteriophage-type RNA polymerase family active site 

signature 1- 

CONSENSUS: P-ELIVMl-x (2)-D-IEGA3-CST3-0:AC3-CSN3-CGA3-CLIVnFY3- 

30 . a . 

NAME: Bacteriophage-type RNA polymerase family active site 

signature 2. 

CONSENSUS: ELIVnFJ-x-R-x(3)-K-x(2)-ELIVhT:D-l1-EP'n-x<2>-Y. 



35 



NAME: Eukaryotic RNA polymerase II heptapeptide repeat. 

CONSENSUS: Y-ESTH-P-ESTl-S-P-ESTANKl . 



NAME: RNA polymerases beta chain signature. 

40 CONSENSUS: G-x-K-ELIVPIFAl-ESTACJ-EGSTNll-x-EHSTAll-EGSJ-EfiNHl- 
K-G-EIVT3 • 

NAME: RNA polymerases 11 / IS Kd subunits signature. 

CONSENSUS: F-C-x-EDEKSTJ-C-EGNKJ-EDNSA J-ELIVriHJ-ELIVrO- 
45 x(fl-,m)-C-x(B)-C 

NAME: RNA polymerases D / 30 to MD Kd subunits signature. 

CONSENSUS: N-ESGAH-ELIVMFJ-R-R-xCD-ESAJ-xO^V-xtiO-N-x- 
ESTA3-x(3)-EDN3-E-x-ELI]l- 

50 CONSENSUS: EGA3-x-R-ELI3-EGA]I-ELIVH3(H)-P. 

NAME : RNA polymerases H / 23 Kd subunits signature. 

CONSENSUS : H-ENEIl-ELIVflJ-V-P-x-H-xC2>-ELIVHJ-x<E>-C»EJ. 

55 NAME : RNA polymerases K / 14 to Ifl Kd subunits signature- 

CONSENSUS : ESTJ-x-EFYI-E-x-EATHI-R-x-ELIVM-EGSAll-x-R-ESAJ-x- 

a. 
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NAME: RNA polymerases L / 13 to lb Kd subunits signature- 

CONSENSUS : EDEJ ( 2 ) -H-ESTI-ELIVMJ-EGAPl-N-x (11) -V-x-EFMH-x ( 2) - 
Y-x(3)-H-P- 

5 NAME: RNA polymerases N / fl Kd subunits signature- 

CONSENSUS: ELIVMF])<2)-P-ELIVM]l-x-C-F-EST:D-C-G. 

NAME : DNA polymerase family A signature. 

CONSENSUS: R-x(2)-IGSAVl-K-x(3)-ELIVMFY]l-EAG(2:D-x<2)-Y-x<2)- 
10 EGSJ-xC3)-ELIVMA]|. 

NAME: DNA polymerase family B signature- 

CONSENSUS : EYAl-EGLIVMSTAO-D-T-D-ESGJ-CLIVMFTO-x- 
ELIVMSTACJ. 

15 

NAME: DNA polymerase family X signature. 

CONSENSUS : G-ISGDl-ELFYll-x-R-EGEl-x (3) -ESGCU-x-D-ELIVMJ-D- 

ELIVMFY3 (3) -x(2> -ESAPJ- 
20 NAME : Galactose-l-phosphate uridyl transferase family 1 
active site signature- 

CONSENSUS : F-E-N-ERO-G-x(3)-G-x(»4)-H-P-H-x-c3. 

NAME: Galactose-l-phosphate uridyl transferase family S 
25 signature- 

CONSENSUS: D-L-P-I-V-G-G-EST3-ELIVM3(B)-ESA3-H-EDENJ-H-EFY1- 
a-G-6. 

NAME: ADP-glucose pyrophosphorylase signature .1- 
30 CONSENSUS: EAG3-G-G-x-G-ESTK3-x-L-x(E)-L-ETA3-x(3) -A-x-P-A- 

ELV3. 



35 



NAME: ADP-glucose pyrophosphorylase signature 2. 
CONSENSUS: til-EFY3-x-G-EST]l-A-IDNSH]l-EAS3-ELIVMFYU]l . 

NAME: ADP-glucose pyrophosphorylase signature 3. 
CONSENSUS: EAPVa-EGSJ-M-G-ELIVMNJ-Y-EIVO-ELIVMFYH-x (2 ) - 

EDENPHO- 

40 NAME: Phosphatidate cytidylyltransf erase signature- 

CONSENSUS : S-x-ELIVMF3-K-R-x(H)-K-D-x-EGSA]l-x(2)-ELI]l-EPGl-x- 

H-G-G-ELIVMJ-x-D-R- 

CONSENSUS: ELIVMFT3-D • 

45 NAME: Ribonuclease PH signature. 

CONSENSUS: C-IDE3-ILIVM3 (2 ) -d-EGTA J-D-G-ESG3-X (2)-ETA3-A . 



50 



NAME: 2'-5'-oligoadenylate synthetases signature 1- 
CONSENSUS: G-G-S-x-EAGJ-CKRJ-x-T-x-L-EKRH-EGSTl-x-S-D-EAGH - 

NAME: 2'-5'-oligoadenylate synthetases signature 2- 
CONSENSUS: R-P-V-I-L-D-P-X-EDE3-P-T . 



NAME: CDP-alcohol phosphatidyltransf erases signature. 

55 CONSENSUS: D-G-x(2)-A-R-x(S>-G-x(3)-D-x(3)-D. 

NAME: PEP-utilizing enzymes phosphorylation site signature. 
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CONSENSUS: G-EGA3-X-ETN J-X-H-ESTA3-ESTAV3-ELIVMJ (2 ) -ESTAV3- 

ERG3 - 

NAME: PEP-utilizing enzymes signature 5- 
5 CONSENSUS: EDE<2S3-x-ELIVMF3-S-ELIVMF3-G-EST3-N-I>-ELIVM J-x-<3- 

ELIVMFYGT3-ESTALIV3- 

CONSENSUS: ELIVMF3-EGAS3-X (2) -R • 

NAflE: Rhodanese signature 1- 
10 CONSENSUS: EFY3-X (3) -H-ELIVJ-P-G-A-x (2) -ELIVF3 • 

NAME: Rhodanese C-termihal signature- 

CONSENSUS: EAV3-x(2)-EFY3-EDEAP3-G-EGSA3-ElilF3-x-E-li:FYIiU. 

15 NAME: CoA transferases signature 1- 

CONSENSUS: EDN3-EGN3-x(2)-ELIVMFA3(3)-G-G-F-x(3)-G-x-P. 



20 



45 



NAME: CoA transferases signature 2. 

CONSENSUS: ELF3-EH<23-S-E-N-G-CLIVF3 (2) -EGA3 • 

NAME: - Phospholipase A2 histidine active site. 
CONSENSUS: C-C-x(2)-H-x(2)-C- 



NAME: Phospholipase A2 aspartic acid active site. 

25 CONSENSUS: ELIVMAJ-C-CLIVMFYIi)PCST3-C-:D-x(S)-C. 

NAME: Lipasesi serine active site. 

CONSENSUS: E LI V3-x- ELI VFY3-CL I VMST3-G-EH YU V3-S-X-G- EGS T AC3 . 

30 NAME: Colipase signature. 

CONSENSUS: Y-xC2)-Y-Y-x-C-x-C 

NAME: Lipolytic enzymes "G-D-S-L" familyi serine active 
site* 

35 CONSENSUS: ELIVI1FYAG3(H)-G-D-S-ELIVI13-x(l-.2)-ETAG3-G. 

NAME: Lipolytic enzymes "G-D-X-G" familyn putative histidine 

active site. 

CONSENSUS: ELIVMF3(2)-x-ELIVnF3-H-G-G-ESAG3-EFY3-x(3)-ESTI>N3- 
40 x(2)-EST3-H. 

NAME: Lipolytic enzymes "G-D-X-G" familyi putative serine 

active site. 

CONSENSUS: ELIVM3-X-ELIVMF3-ESA3-G-D-S-ECAJ-G-EGA3-X-L-ECAJ - 



NAME: Carboxylesterases type-B serine active site- 

CONSENSUS: F-EGR3-G-xC4) -ELIVM3-x-ELIV3-x-G-x-S-ESTAG3-G • 



NAME: Carboxylesterases type-B signature 2. 

50 CONSENSUS: EEP3-D-C-L-CYT3-CLIV3-CI>NS3-ELIV3-ELIVFYIil3-x- 
EP(2R1. 

NAME: Pectinesterase signature 1. 

CONSENSUS: EGSTN3-x(S)-ELIVM3-x-ELIVM3-x(2)-G-x-Y-ONK3-E-x- 
55 ELIVM3-X-ELIVM3 • 

NAME: Pectinesterase signature 2- 

CONSENSUS: G-ESTAD3-ELIVMT3-D-F-I-F-G • 
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NAME: Peptidyl-tRNA hydrolase signature 1. 

CONSENSUS: EFYl-x (2) -T-R-H-N-x-6-x (2) -ELIVMFA3 (2) -OE3 - 

NAME: Peptidyl-tRNA hydrolase signature 2- 

CONSENSUS: EGSJ-x (3) -H-N-G-ELIVMH-EKRID-ONSl-IELIVMT]] - 

NAME: Alkaline phosphatase active site- 

CONSENSUS : EIVI-x-D-S-EGASH-ICGASO-EGASTl-IIlGAID-T - 

NAME: Histidine acid phosphatases phosphohistidine 
signature • 

CONSENSUS : ELIVM3-xC2)-ELIVMA3-xC2)-li:LIVM3-x-R-H-EGN:i]-x-R-x- 
CPAS! • 

NAME : Histidine acid phosphatases active site signature- 
CONSENSUS: ELIVMFJ-x-ELIVHFAGJ-x(2)-ESTAGI3-H-D-|[STANa:D-x- 
ELIVI1]l-x(2)-ELIVnFY3-x(2)- 
CONSENSUS: CSTA3- 

NAME: Class A bacterial acid phosphatases signature- 
CONSENSUS : G-S-Y-P-S-G-H-T - 



NAME: 5' -nucleotidase signature 1- 
25 CONSENSUS: ELIVM3-X-ELIVM3 (2)-IHEAJ-ETIJ-x-:D-x-H-EGSA3-x- 

ELIVMFJ - 

NAME: 5'-nucleotidase signature 2- 
CONSENSUS: EFYPl-x(M)-IELIVM3-G-N-H-E-F-CDN3. 

30 

NAME: Fructose-l-b-bisphosphatase active site- 
CONSENSUS: EAG J-ERK J-L-x ( 1 i2 ) -ELIV3-CFY3-E-X ( 2 ) -P-ELIVM3- 

CGSA3- 

35 NAME: Serine/threonine specific protein phosphatases 
signature- 

CONSENSUS: ELIVM3-R-G-N-H-E ■ 

NAME: Protein phosphatase 2A regulatory subunit PRSS 
40 signature 1- 

CONSENSUS : E-F-D-Y-L-K-S-L-E-I-E-E-K-I-N . 

NAME: Protein phosphatase 2A regulatory subunit PRSS 
signature 2- 

45 CONSENSUS: N-EAGJ-H-ETAl-Y-H-I-N-S-I-S-ELIVMJ-N-S-D - 

NAME: Protein phosphatase 2C signature- 

CONSENSUS: ELIVMFY3-ELIVMFYA3-EGSAO-ELIVM3-EFYC3-D-G-H- 



EGAV3- 

NAME : Tyrosine specific protein phosphatases active site- 
CONSENSUS: ELIVMF3-H-C-x(2)-G-x(3)-ESTC3-ESTAGP3-x-|[LIVMFY3 

NAME: Tyrosine specific protein phosphatases profile. 

NAME: Dual specificity protein phosphatase profile- 

NAME: PTP type protein phosphatase profile- 
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NAME: Inositol monophosphatase family signature 1- 

CONSENSUS: CFU V3-x ( □ -, 1 ) -QILIVril-D-P-lCLIVrill-P-inSGJ-CSTl-x C E ) - 

EFY3-X-EHKRNSTY3 - 

5 

NAME: Inositol monophosphatase family signature E- 

CONSENSUS: EUV3-D-X-EAC3-EGSA3-IGSAPV3-X-ELIVACP3-ELIV3-. 
CLIVAC3-x(3)-CGHl-C6A3. 

10 NAME: Prokaryotic zinc-dependent phospholipase C signature. 

CONSENSUS: H-Y-x-CGT3-D-[i:LIVM3-E]>NS3-x-P-x-H-EPA3-x-N . 

NAME: Phosphatidylinositol-specif ic phospholipase X-box 

domain profile. 
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NAME: Phosphatidylinositol-specif ic phospholipase Y-box 

domain profile- 



NAME: 3 r 5'-cyclic nucleotide phosphodiesterases signature. 
20 CONSENSUS: H-D-ELIVMFY3-x-H-x-IAG3-x(E)-IN(23-x-ELIVMFY3. 

NAME: cAMP phosphodiesterases class-II signature- 
CONSENSUS: H-X-H-L-D-H-ELIVM3-X-EGS3-ELIVMA3-ELIVM3 (E ) -x-S- 

EAP3- 

25 

NAME: Sulfatases signature 1- 

CONSENSUS: ESAP3-ELIVMST3-ECS3-ESTAC3-P-ESTA3-R-x(E)- 
ELIVMFU3(£)-ETR3-G. 

30 NAME: Sulfatases signature E • 

CONSENSUS: G-EYV3-x-EST3-xCE)-EIVA3-G-K-x<D-,l)-EFYlilK3-EHL3. 

NAME: AP endonucleases family 1 signature !• 
CONSENSUS: EAPF3-D-ELIVMF3(E)-x-ELIVM3-<J-E-x-K. 



NAME: AP endonucleases family 1 signature 5. 

CONSENSUS: D-EST3-EFY3-R-EKH3-X ( 7-.fi) -EFYW3-EST3-EFYU3 (E) - 



NAME: AP endonucleases family 1 signature 3- 
40 CONSENSUS: N-x-G-x-R-ELIVM3-])-ELIVMFYH3-x-ELV3-x-S - 

NAME: AP endonucleases family E signature 1. 
CONSENSUS: H-x (E ) -Y-ELIVMF3-EIM3-N-ELIVMCA3-EAG3 • 

45 NAME: AP endonucleases family E signature E. 
CONSENSUS: EGR3-ELIVMF3-C-ELIVM3-D-T-C-H . 



NAME: AP endonucleases family E signature 3- 

CONSENSUS: ELIVMIO-H-x-N-EDE3-ESA3-K-x<3)-G-ESA3-x(E)-]>. 

NAME: Deoxyribonuclease I signature 1- 

CONSENSUS: ELIVM3 (S)-EAP3-L-H-ISTA3 (E)-P-x (S) -E-ELIVM J-EDN3- 

X-L-X-EDE3-V • 

55 NAME: Deoxyribonuclease I signature E- 

CONSENSUS: G-D-F-N-A-X-C-ESA3 - 

NAME: Endonuclease III iron-sulfur binding region signature- 
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CONSENSUS: C-x(3)-EKRSJ-P-ICKRAGL]l-C-x(2)-C-x(S)-C. 

NAME: Endonuclease III family signature. 

CONSENSUS: CGST:D-x-ELIVMF]]-P-x(5)-ELIVMU]]-xC2-,3>-ELI3-EPAS:]]- 
5 G-V-EGA3-x(3)-IGAC3- 

CONSENSUS: x (3) - ELIVM3I-X (2) - ESALVl-ELIVMFYhn-EGANO • 

NAME: Ribonuclease II family signature- 

CONSENSUS: EHi:D-EFYE:D-EGSTAM:]|-li:LIVM:il-xCi|-,5>-Y-ESTAL3-x- 
10 [FUVAC3-ETV3-ESA3-P-ELIVI1A]!- 

CONSENSUS: CRUll-CKR3-EFY3-x-])-x(3)-EHl33. 
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35 



NAME: Ribonuclease III family signature- 

CONSENSUS : EDEfi J-tRfl J-ELfO-E-IEFYIilll-IELVI-G-D-ESARl ■ 

NAME: Bacterial Ribonuclease P protein component signature- 

CONSENSUS: ELIVMFYSJ-x(2)-A-x(2)-R-INH3-EKRflL3-ELIVH3-EKRA:j- 
R-x-ILIVMTAJ-EKRID. 



20 NAME: Ribonuclease T2 family histidine active site 1- 

CONSENSUS: CFYULH-x-IELIVMl-H-G-L-U-P. 



NAME : Ribonuclease T2 family histidine active site 2- 
CONSENSUS: ELIVMFJ-x (2) -EHDGTYl-EECn-EFYIO-x-EKRI-H-G-x-C - 

NAME: Pancreatic ribonuclease family signature- 
CONSENSUS: C-K-x (2 ) -N-T-F - 



NAME : DNA/RNA non-specific endonucleases active site. 

30 CONSENSUS: D-R-G-H-E<2IL3-x(3)-A- 



NAME: Thermonuclease family signature 1- 

CONSENSUS: D-G-D-T-CLIVMJ-x-IELIVMCni-x ( 1-ilQ ) -R-ELIVMl-x (2 ) - 

ELIVrO-D-x-P-E- 

NAME: Thermonuclease family signature 2- 

CONSENSUS: »-EKR]l-Y-EGfl3-R-x-ELV]|-EGA3-x-EIV3-IFYU3 • 



NAME: Beta-amylase active site 1- 

40 CONSENSUS : H-x-C-G-G-N-V-G-D - 

NAME : Beta-amylase active site 2- 

CONSENSUS: G-x-ESA J-G-E-ELIVMJ-R-Y-P-S-Y - 

45 NAME : Glucoamylase active site region signature- 

CONSENSUS: ESTN3-IGP1-X (l-.2)-El>E3-x-U-E-E-x (2 )-EGS3 - 

NAME: Polygalacturonase active site- 

CONSENSUS: CGSDENKRHH-x (2) -EVMFO-x (2) -EGSJ-H-G-ELIVMAGJ- 

50 x(li2)-ELIVM3-G-S. 

NAME: Clostridium cellulosome enzymes repeated domain 

signature- 

CONSENSUS : D-ELIVMFYl-EDNVH-x-EDNSID-x ( 2) -ELIVM3-EDN3-ESALM3- 

55 x-P-x(3)-ELIVMF3-x- 

CONSENSUS: ERKSH-x-ELIVMFJ • 

NAME: Chitinases family Ifl active site- 

^31- 
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CONSENSUS: ILIVriFYI-CDNJ-G-ICLIVriFl-inBNl-IELIVIIFI-ONDI-x-E - 

NAME: Chitinases family 11 signature !• 

CONSENSUS: C-x ( M •, 5) -F-Y-CSTH-x (3)-CFY3-l[LIVnF]l-x-A-x (3) -EYF3I- 

x(2)-F-CGSAJ- 

NAME: Chitinases family n signature 2- 

CONSENSUS: ELIVnj-EGSA3-F-x-ESTAGJ(2)-ELIVI1FY3-ld-EFY]]-li)- 

CLivna - 

NAME : Alpha-lactalbumin / lysozyme C signature- 

CONSENSUS: C-x(3)-C-x(5)-0:LI1F]l-x(3)-EDEN3-ELI3-x(S)-C. 



NAME: Alpha-galactosidase signature. 

15 CONSENSUS: G-ELIVI1FY]l-x(S)-ELIVnFY]l-x-ELIVI1]l-D-D-x-U-x(3^)- 
R-EDNSF3. 



NAflE: Trehalase signature 1- 
CONSENSUS: P-G-G-R-F-x-E-x-Y-x-U-D-x-Y . 

NAME: Trehalase signature 2- 
CONSENSUS: fl-U-D-x-P-x-EGAl-U-CPAH-P . 



NAME: Alpha-L-f ucosidase putative active site- 

25 CONSENSUS: P-xC2)-L-x(3)-K-U-E-x-0 

NAME: Glycosyl hydrolases family 1 active site- 

CONSENSUS: ELIVMFSTO-ELIVFYSI-ELIVJ-ELIVMSTni-E-N-G- 
ELIVMFAR3-ECSAGN3. 

30 

NAME: Glycosyl hydrolases family 1 N-terminal signature. 

CONSENSUS: F-x-EFYlilflH-EGSTAJ-x-EGSTAJ-x-EGSTAll <2)-EFYNH3- 

EN(J3-x-E-x-EGSTA3. 

35 NAME: Glycosyl hydrolases family 5 signature 1- 

CONSENSUS: N-x-ELIVI1FYIJl>3-R-ESTACN]l(2)-H-Y-P-x(M)- 

ELIVI1FYU3(2)-x(3)-EDN3-x(2)- 

CONSENSUS: G-ELIVMFYIO (M) - 

40 NAME: Glycosyl hydrolases family 2 acid/base catalyst- 

CONSENSUS: OENflFJ-EKRVIO-N-H-EAPJ-ESAO-ELIVMFJ (3 ) -U-EGSH- 

x(2-,3)-N-E- 

NAME: Glycosyl hydrolases family 3 active site- 

45 CONSENSUS: ELIVM3 (2) -EKRJ-x-EEflO-x CM )-G-ELIVMFT]|-ELlVT3l- 

ELIVI1F3-ESTJ-D-x(2)- 
CONSENSUS: ESGADNID- 

NAME: Glycosyl hydrolases family 5 signature. 

50 CONSENSUS: ELIV3-ELIVnFYUGA3(2)-CDNE(3GJ-CLIVnGST3-x-N-E-EPV]I- 
ERHDNSTLIVFY3 . 



NAME: Glycosyl hydrolases family b signature 1. 

CONSENSUS: V-x-Y-x(2)-P-x-R-P-C-EGSAF]l-x(2)-EGSA]l(2)-x-G. 

NAflE: Glycosyl hydrolases family b signature 2- 

CONSENSUS: ELIVnYA3-ELIVA3-ELIVTl-ELIV3-E-P-I>-ISAL]]-ELI3- 
EPSAG3- 
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NAUE: Glycosyl hydrolases family fl signature- 

CONSENSUS: A-ESTJ-D-CAGl-D-x^-EIMl-A-x-ESAiB-ELIVMIII-ELIVMGll- 

x-A-x(3)-EFIiO. 

5 

NAME: Glycosyl hydrolases family T active sites signature 1- 
CONSENSUS: ESTVJ-x-ELIVMFYJ-ESTVJ-x^-G-x-ENKRJ-xCO- 
EPLIVMH-H-x-R. 

10 NAME: Glycosyl hydrolases family T active sites signature 2. 
CONSENSUS: EFYIiO-x-D-xCM >-EFYIi)3-x(3)-E-x-ESTA:II-xC3)-N-ESTA:D. 

NAME: Glycosyl hydrolases family ID active site. 
CONSENSUS : EGTAl-x ( H) -ELIVNI-x-EIVMFl-ESTl-E-ELIYH-EDN]!- 

15 ELIVMFJ. 

NAME: Glycosyl hydrolases family 11 active site signature !• 
CONSENSUS: IEPSA3-CL(33-x-E-Y-Y-ELIVn3(2)-EDE]l-x-EFYli)HN3. 

20 NAME: Glycosyl hydrolases family 11 active site signature 2. 
CONSENSUS: ELIVMFl-x^-E-EAGl-EYUGJ-ECJRFGSl-ESGl-ESTANJ-G-x- 
ESAFJ. 

NAME: Glycosyl hydrolases family lb active sites. 
25 CONSENSUS: E-ELIVJ-D-ILIV3-x(0-,l)-E-x<2)-EGdJ-l[KRNF:D-x- 
EPSTA3- 

NAME: Glycosyl hydrolases family 17 signature. 
CONSENSUS: ELIVMJ-X-ELIVMFYUA3 (3) -ESTAGH-E-ESTAJ-G-U-P-ESTNH- 

30 x-ESAGfll. 

NAME: Glycosyl hydrolases family 25 active sites signature- 
CONSENSUS: D-ELIVM3-x(3)-EN(Jl-EPG3-x(T-.lD)-G-x(M)- 
ELIVNFY3(2)-K-x-ESTJ-E-E6S]l-x(2)- 
35 CONSENSUS: Y-x-EDNJ • 

NAME: Glycosyl hydrolases family 31 active site- 
CONSENSUS: EGF3-ELIVMF3-U-X-D-M-ENSA3-E • 

40 NAME: Glycosyl hydrolases family 31 signature 2. 

CONSENSUS: G-EAV3-l>-ELIVI1T]l-C-G-EFY]l-x(3)-EST3-x(3)-L-C-x-R- 

U-x(2)-ELV3-EGS3-ESAl- 

CONSENSUS: F-x-P-F-x-R-EDNJ- 

45 NAI1E: Glycosyl hydrolases family 32 active site. 
CONSENSUS: H-x C2)-P-x(iO-ELIVM3-N-:D-P-N-G. 



50 



NAUE : Glycosyl hydrolases family 35 putative active site- 
CONSENSUS: G-G-P-ELIVI13(2)-x(2)-i3-x-E-N-E-IFY3. 

NAME: Glycosyl hydrolases family 3T active site. 
CONSENSUS: U-x-F-E-x-U-N-E-P-ONJ . 



NAME: Glycosyl hydrolases family MS active site. 

55 CONSENSUS: ESTA3-T-R-Y-EFYU3-I>-x(5)-ECA3. 

NAME: Prokaryotic transglycosylases signature- 
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CONSENSUS : CLIVn3-x(3)-E-S-x(3)-0:AP]l-x(3)-S-x(S)-G-CLIVI13- 

ELIVMFYWJ-x-ELIVMFYliD- 

CONSENSUS: x(i|)-ESAG]]. 

5 NAME : Inosine-uridine preferring nucleoside hydrolase famil 

signature. 

CONSENSUS: D-x-D-EPTI-EGAJ-x-D-D-ETAVH-EVU-A • 

NAI1E : Alkylbase DNA glycosidases alkA family signature- 

10 CONSENSUS: G-I-G-x-ld-ESTI-EAVH-x-ELIVMFYJ CE)-x-ELIVM]l-x(fi>- 

EHF3-x(5)-EE])]|-]>. 

NAME: Formamidopyrimidine-DNA glycosylase signature. 

CONSENSUS: C-x <E -. M ) -C-x-EGTAtO-x-EIV J-x ( 7 ) -R-EGSTANJ-ESTAl-x 

15 CFYI3-C-x(E)-C-ia. 

NAME: Uracil-DNA glycosylase signature. 

CONSENSUS: EKRl-ELIVID-ELIVO-ELIVMJ-x-G-EaiJ-D-P-Y. 

20 NAME: S-adenosyl-L-homocysteine hydrolase signature 1. 

CONSENSUS: ECSJ-N-x-IFYLJ-S-ESTI-EflAI-IEDENl-x-IAVJCa^A-A- 
ELIVJ-ESAVJ. 

NAME: S-adenosyl-L-homocysteine hydrolase signature B- 

25 CONSENSUS: G-K-x (3) -ELIVH-x-G-Y-G-x-V-G-EKRD-G-x-A- 

NAME: Cytosol aminopeptidase signature. 

CONSENSUS: N-T-D-A-E-G-R-L • 

30 NAME: Aminopeptidase P and proline dipeptidase signature. 

CONSENSUS: EHAJ-EGSYRJ-ELIVMTJ-ESGni-H-x-EI-IVl-G-ELIVMl-x- 
EIV3-H-EDEJ • 

NAME: Methionine aminopeptidase subfamily 1 signature. 

35 CONSENSUS: EMFY3-X-G-H-G-ELIVMC3-EGSH3-X ( 3)-H-x (M) -ELIVMl-x- 

EHNJ-EYUViD • 

NAME* Methionine aminopeptidase subfamily E signature* 

CONSENSUS: EJAl-ELIVMYl-x-K-ELIVMl-D-x-G-x-EHfll-ELIVMl-EDNSJ 
40 G-x(3)-E»N3. 

NAME: Renal dipeptidase active site. 

CONSENSUS: ELIVM3-E-G-EGAl-x(E)-ELIVMF]l-x(b)-L-x(3)-Y-x(a)-G 
ELIVM3-R. 



45 



NAME: Serine carboxypeptidases t serine active site. 

CONSENSUS: ELIVM3-X-EGTA1-E-S-Y-EAG3-EGS]] • 



NAME: Serine carboxypeptidasesi histidine active site. 

50 CONSENSUS: CLIVFJ-x (E) -ELIVSTAl-x-EIVPSTl-x-EGSDNflLl-ESAGVl- 

ESG3-H-x-EIVAO-P-x(3>- 
CONSENSUS: EPSAJ. 

NAME: Zinc carboxypeptidasesi zinc-binding region 1 

55 signature- 

CONSENSUS: EPO-x-ELIVMFYH-x-ELIVMFYJ-x ( M ) -H-ESTAG3-x-E-x- 

ELIVMH-ESTAGID-xCb)- 

CONSENSUS: ELIVMFYTA3 ■ 

-434- 



WO 01/98454 PCT/IB01/02050 

NAME: Zinc carboxypeptidases zinc-binding region S 
signature . 

CONSENSUS: H-IESTA6]I-x(3) -CLIVHEJ-x ( 5 > -CLIVnFYU3I-P-CFYU J - 

5 

NAME : Serine proteases-i trypsin family-, histidine active 
site . 

consensus: clivhi.-csti-a-cstagi-h-c - 

10 NAME: Serine proteases-, trypsin family-, serine active site. 

CONSENSUS: ONSTAGO-EGSTAPinVi2H:il-x<E)-G-OE]|-S-G-li:GS3- 
ESAPHV3-ELIVHFYUH3- 
CONSENSUS: ELIVMFYSTANflHl • 

15 NAME: Serine proteases-, subtilase family-, aspartic acid 

active site- 

CONSENSUS: ESTAIVl-x-ELIVHFU-CLIVn J-D-OSTA3-G-ELIVh*FC:]I- 

x(2-.3)-ILDNHl. 

20 NAME: Serine proteases-i subtilase family-, histidine active 

site. 

CONSENSUS: H-G-ESTMl-x-EVICH-ICSTAGCll-IEGSJ-x-ELIVHA]!- 
CSTAGCLVI-CSAGni. 

25 NAME: Serine proteases-, subtilase family-, serine active 

site. 

CONSENSUS: G-T-S-x-ESAI-x-P-xCE^ESTAVCJ-EAGID. 

NAME: Serine proteases-. Vfi family-, histidine active site- 

30 CONSENSUS: ICSTl-G-a:LIVnFYU3(3)-CGNJ-x(E)-T-CLIVri]I-x-T-x(E)-H. 

NAME: Serine proteases-. Vfl family-, serine active site- 

CONSENSUS: T-x (E) -EGCI-ICNtfJ-S-G-S-x-fllLIvTO-EFYl - 

35 NAME: Serine proteases-, omptin family signature 1- 
CONSENSUS: Id-T-D-x-S-x-H-P-x-T • 

NAME: Serine proteases-, omptin family signature E. 
CONSENSUS: A-G-Y-fl-E-CSO-R-CFYO-S-EFYIiO-ICTNll-A-x-G-G-ESTI- 
40 Y. 

NAME: Prolyl endopeptidase family serine active site- 
CONSENSUS: D-x C3)-A-x O^CLIVMFYIiU-x C1M >-G-x-S-x-G-G- 

ELIVMFYWIKE) • 
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NAME: Endopeptidase Clp serine active site. 

CONSENSUS: T-x (E ) -ELIVNF3-G-x-A-ESAC3-S-ICriSA]l-EPAG]l-CSTA3 • 



NAME: Endopeptidase Clp histidine active site. 

50 CONSENSUS: R-x(3)-CEAPJ-x(3)-ELIVHFYT3-t1-IELIVrO-H-<3-P. 



NAME: ATP-dependent serine proteases-. Ion family-, serine 

active site- 

CONSENSUS: D-G-EPD3-S-A-EGS3-ELIVMCA3-ETAJ-ELIVI"0 . 

NAME: Eukaryotic thiol (cysteine) proteases cysteine active 

site. 

CONSENSUS: fi-x (3) -CGE3-X-C-CYU3-X (E) -CSTAGC3-CSTAGCV3 . 
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NAME: Eukaryotic thiol (cysteine) proteases histidine active 

site. 

CONSENSUS: CLIVM6STAN3-x-H-CGSACE3-£LIVN3-x-li:LIVI1AT3 (5) -G-x- 

5 EGSADNH3. 

NAME: Eukaryotic thiol (cysteine) proteases asparagine 

active site- 

CONSENSUS: EFYCH3-EUI3-ELIVT3-x-EKR(2AG3-N-li:ST3-l l l-x(3)-li:FYk)3- 
10 G-x(2)-G-ICLFYlil3- 

CONSENSUS: CLIVI1FYG3-x-ELIVriF3 • 

NAME: Ubiquitin carboxyl-terminal hydrolase family 1 

cysteine active-site- 
15 CONSENSUS: fl-x(3)-N-ESA3-C-G-x(3)-ELIVH3(H)-H-CSA3-ELIVH3- 
CSA3 • 

NAME : Ubiquitin carboxyl-terminal hydrolases family 2 

signature 1. 

20 CONSENSUS: G-ELIVMFY3-x(l-.3)-EAGC3-ICNASM3-x-C-EFYIil3-ELIVMC3- 
ENST3-0:SACV3-x-ELIVMS3- 
CONSENSUS: <2- 

NAME : Ubiquitin carboxyl-terminal hydrolases family 2 

25 signature 2- 

CONSENSUS: Y-x-L-x-ESAG3-ELIVMFT3-x ( 2) -H-x-G-x (MtS)-G-H-Y. 

NAME: Caspase family histidine active site- 
CONSENSUS: H-x (2i 4 )-ESC3-x ( 4 ) -ELIVMF3 (2) -EST3-H-G • 

30 

NAME: Caspase family cysteine active site- 
CONSENSUS: K-P-K-ELIVMF3 ( 4 ) -l2-A-C-ICR<2G3-G - 

NAME: Eukaryotic and viral aspartyl proteases active site- 
35 CONSENSUS: ELIVMFGAC3-ELIVMTADN3-ELIVFSA3-1>-EST3-G-I!:STAV3- 
CSTAPDEN(23-x-ELIVMFSTNC3- 
CONSENSUS: X-ELIVMFGTA3 • 

NAME: Neutral zinc metallopeptidases-i zinc-binding region 
40 signature. 

CONSENSUS: CGSTALIVN3-x(S)-H-E-ELIVMFYU3-CDEHRKP}-H-x- 
ELIVMFYUGSP<23 . 

NAME: Matrixins cysteine switch. 
45 CONSENSUS: P-R-C-EGN3-x-P-OR3-0-IVSAPKfl3 - 

NAME: Insulinase familyi zinc-binding region signature. 

CONSENSUS: G-x(fl-. c 1)-G-x-ESTA3-H-ELIVMFY3-ELIVMCI-EDERN3- 
EHRKL3-ELMFAT3-X-CLFSTH3-X- 
50 CONSENSUS: CGSTAN3-EGST3 • 

// 

AC PSDlOlb^ 
55 DE Glycoprotease family signature. 

CONSENSUS: EKR3-EGSAT3-x(M)-EFYUHL3-OfiNGK3-x-P-x-ELIVMFY3- 

x(3)-H-x(S)-EAG3-H- 

C0NSENSUS: ELIVM3- 
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NAME: Proteasome A-type subunits signature. 

CONSENSUS: EFY3-X ( M) -ESTNV3-X-EFYU3-S-P-X-G-ERKH3-X ( 2 ) -(J- 

ELIVM3-EDE3-Y-ESA]>3-x(2)- 
5 CONSENSUS: ESAG3 • 

NAME: Proteasome B-type subunits signature. 

CONSENSUS: ELIVMA3-EGSA3-ELIVMF3-X-EFYLVGAC3-X (2)-EGSACFY3- 

ELIVMSTAC3(3)-EGAC3- 
10 CONSENSUS: EGSTACV3-EDES3-x(1S)-ERK3-x(:L2-.:L3>-G-x(2)-EGSTA3- 
D. 

NAME: Signal peptidases I serine active site. 

CONSENSUS: EGS3-X-S-M-X-EPS3-EAT3-ELF3. 



15 



40 



NAME: Signal peptidases I lysine active site. 

CONSENSUS: K-R-ELIVMSTA3 (2) -G-x-EPG3-G-EDE3-x-ELIVM3-x- 

ELIVMFY3- 



20 NAME: Signal peptidases I signature 3. 

CONSENSUS: ELIVMFYU3 (2 ) -x (2 ) -G-D-ENH3-X (3 ) -ESND3-X ( 2 >-ESG3 - 

NAME: Signal peptidases II signature- 

CONSENSUS: EGAF3-EGA1-EGAS3-ELIVM3-EGAS3-N-ELVMFG3-ELIVMFY3- 
25 D-R-ELIMFA3. 

NAME: Peptidase family U32 signature. 

CONSENSUS: E-x-F-x (2) -G-ESA3-ELIVM3-C-X (1 ) -G-X-C-X-ELIVM3-S • 

30 NAME: Amidases signature. 

CONSENSUS: G-EGA3-S-S-EGS3-G-X-EGSA3-EGSAVY3-X-ELIVM3-EGSA3- 
x(t)-EGSA3-x-EGA3-x-D- 

CONSENSUS: x-EGA3-x-S-ELIVM3-R-x-P-EGSAC3 • 

35 NAME: Asparaginase / glutaminase active site signature 1- 
CONSENSUS: ELIVM3-x(2)-T-G-G-T-EIV3-EAGS3 • 



NAME: Asparaginase / glutaminase active site signature 2- 
CONSENSUS: G-X-ELIVM3-X (2 ) -H-G-T-D-T-ELIVM3 • 

NAME: Urease nickel ligands signature- 

CONSENSUS: T-EAY3-EGA3-EGAT3-ELIVM3-I>-x-H-ELIVM3-H-x(3)-P. 



NAME: Urease active site. 
45 CONSENSUS: ELIVM3 (2)-ECT3-H-EHN3-L-x C3)-ILIVM3-x(2)-D-ELIVM3- 

x-F-A. 

NAME: ArgE / dapE / ACY1 / CPG2 / yscS family signature 1- 
CONSENSUS: ELIV3-EGALMY3-ELIVMF3-x-EGSA3-H-x-D-ETV3-ESTAV3 • 

50 

NAME: ArgE / dapE / ACY1 / CPG2 / yscS family signature 2- 
CONSENSUS: EGSTAI3-ESANfl3-D-x-K-EGSACN3-x(2)-ELIVMA3-x(2)- 
ELIVMFY3-x(m-.l?)-ELIVM3- 

CONSENSUS: x-ELIVMF3-ELIVMSTAG3-ELIVMFA3-x(2)-EDNG3-E-E-x- 
55 EGSTN3- 

NAME: Dihydroorotase signature 1. 

CONSENSUS : D-ELIVMFYWSAP3-H-ELIVA3-H-ELIVF3-ERN3-X-EPGN3 - 
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NAME: Dihydroorotase signature 2- 
CONSENSUS: EGAl-ESTl-D-x-A-P-H-x ( 1 ) -K - 

5 NAME: Beta-lactamase class-A active site- 

CONSENSUS: EFYI-x-ELIVNFYl-x-S-ETVI-x-K-x (1) -EAGLIO-x(2 ) - 

ELO- 

NAME: Beta-lactamase class-C active site- 
10 CONSENSUS: F-E-ELIVrD-G-S-ELIVflGl-ESAJ-IO 

NAME: Beta-lactamase class-D active site- 

CONSENSUS: EPAl-x-S-ESTH-F-K-ELIVJ-EPALJ-x-ESTAJ-ELIl - 

15 NAME: Beta-lactamases class B signature 1- 

CONSENSUS: ELI3l-x-ESTN3-EHN:D-x-H-EGSTA:D-D-xC2)-G-EGP]l-x(7-.fi)- 
EGSJ. 

NANE : Beta-lactamases class B signature 2- 
20 CONSENSUS: P-x(3)-ELIVI1]|<2>-x-G-x-C-ELIVllF:iC2)-K. 

NAME: Arginase family signature 1- 

CONSENSUS: CLIVNFDI-G-G-x-H-x-CLIVIITll-ICSTAVJ-x-CPAGI-xO)- 



25 



CGSTA3. 

NAME: Arginase family signature 2- 

CONSENSUS: " ELIV["l3<2)-x-ELIVI1FY:D-]>-EAS3-H-x-D. 



NAME: Arginase family signature 3- 

30 CONSENSUS: EST]l-ELIVriFY]l-I>-ELIVri3-I>-x(3)-EPAfl3-x(3)-P-li:GSAl- 
x(7)-G- 

NAME: Adenosine and AMP deaminase signature- 

CONSENSUS: ESA3-ELIVM3-ENGS3-ESTA3-D-D-P - 

35 

NAME : Cytidine and deoxycytidylate deaminases zinc-binding 
region signature- 

CONSENSUS: ECH3-EAGV3-E-X (2) -ELIVnFGATJ-ELIVfi:D-x<].7-,33) -P-C- 
x(2-.fl)-C-x(3)-0:LIVn]|. 

40 

NAME: GTP cyclohydrolase I signature 1- 

CONSENSUS: EENI-ELIVHJ (2) -x ^-EKRflNJ-EDNl-ELIVPO-x ( 3)-EST3- 
x-C-E-H-H- 

45 NAME: GTP cyclohydrolase I signature 2- 

CONSENSUS: ESAI-x-ERO-x-fl-ELIVIIJ-fl-E-ERNl-ELIJ-ETSN]). 

NAME: Nitrilases / cyanide hydratase signature 1- 

CONSENSUS: G-x (2) -ELIVNFY!! (2) -x-EIFJ-x-E-x (2 )-ELIVrO-x-G-Y-P . 

50 

NAME-* Nitrilases / cyanide hydratase active site signature- 

CONSENSUS: G-EGAfl]l-x(2)-C-EUA3-E-ENH]|-x(2)-EPST]l-ELIVnFYS3-x- 
EKRJ ■ 

55 NAME: Inorganic pyrophosphatase signature- 

CONSENSUS: »-ESGDN3-D-EPE3-ELIVriF3-»-ELIVNGAC3 - 

NAME: Acylphosphatase signature 1- 
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CONSENSUS: ELIVJ-x-G-x-V-fl-G-V-x-EFMll-R • 

NAME: Acylphosphatase signature 5. 

CONSENSUS: G-EFYliD-EAVC3l-EKR<2AM:D-N-x(3>-G-x-V-x(5>-G. 

5 

NAME: ATP synthase alpha and beta subunits signature. 

CONSENSUS: P-ESAP]|-ELIV]l-EI>NH3-x(3)-S-x-S. 

NAME: ATP synthase gamma subunit signature- 

10 CONSENSUS: EIV]|-T-x-E-xC2>-E]>E]]-x(3)-G-A-x-ESAKR]|. 

NAME: ATP synthase delta (OSCP) subunit signature- 

CONSENSUS: ELIVPI3-x-ELIVMFYT]l-x(3)-ELIVMT]l-E]>ENi3tC]l-x(2)- 
ELIVMI-x-EGSAI-G-ELIVMFYGAl- 

15 CONSENSUS: x-ELIVMJ-EKRHENtO-x-EGSEND - 

NAME: ATP synthase a subunit signature. 

CONSENSUS: ESTAGNll-x-ESTAGII-ELIVMFll-R-L-x-ESAGVl-N-ELIVnT]] - 

20 NAME: ATP synthase c subunit signature- 

CONSENSUS: EGSTA3-R-EN(21-P-x(1D)-ELIVMFYIiI3(2)-x(3)-ELIVMFYIiI3- 
x-EDEI- 

NAME: E1-E2 ATPases phosphorylation site- 

25 CONSENSUS : D-K-T-G-T-ELIl-ETIID - 

NAME: Sodium and potassium ATPases beta subunits signature 
1- 

CONSENSUS: EFYO-x (H ) -EFYIO-x-EFYIO-EDNID-x ( t, ) -ELIVM3-G-R-T- 
30 x(3)-W- 

NAME : Sodium and potassium ATPases beta subunits signature 
2- 

CONSENSUS: ERO-x (2) -C-ERK<2UI3-x (5) -L-x (2 ) -C-ESA3-G- 
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NAME: GDA1/CD3T family of nucleoside phosphatases signature- 

CONSENSUS: ELIVI13-X-G-X (2) -E-G-x-EFY3-x-EFW3-ELIVA3-ETAG3-x- 

N-EHY3 • 



40 . NAME: Iodothyronine deiodinases active site. 

CONSENSUS: R-P-L-V-x-N-F-G-S-ECAJ-T-C-P-x-F - 



NAME: Cutinasei serine active site- 

CONSENSUS: P-x-ESTA3-x-ELIV3-EIVT3-x-0:GS3-G-Y-S-E<2L3-G - 

NAME: Cutinasei aspartate and histidine active sites- 
CONSENSUS: C-x(3)-»-x-EIV3-C-x-G-EGST3-x(2)-ELIVM3-x(2-.3)-H- 



NAME: DDC / GAD / HDC / TyrDC pyridoxal-phosphate attachment 

50 site- 

CONSENSUS: S-ILIVMFYU3-x(S)-K-ELIVMFYUG3(2)-x(3)-ELIVMFYU3-x- 

ECA3-x(2)-ELIVMFY(i)(33- 

C0NSENSUS: x(2)-ERK3- 

55 NAME: Orn/Lys/Arg decarboxylases family 1 pyridoxal-P 

attachment site- 

CONSENSUS: ESTAV3-x-S-x-H-K-xC2)-EGSTAN3<2)-x-ESTA3-(2- 
ESTA3 (2) . 
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NAME: Orn/DAP/Arg decarboxylases family S pyridoxal-P 

attachment site- 

CONSENSUS: EFYl-EPAD-x-K-ESACVI-ENHCLFQO-xCiO-ELIVIIFl- 
5 ELIVHTA3-x(5)-ELIVMAJ-x(3)- 
CONSENSUS: EGTE3 • 

NAME: Orn/DAP/Arg decarboxylases family E signature E- 

CONSENSUS: CGSl-x(E-.b)-ELIVriSCP])-x(S)-ELIVnF3l-EDNS]l-ELIVriCAl- 
10 G-G-G-ELIVMFY3- 

CONSENSUS: EGSTPCEflJ - 

NAME: Orotidine S'-phosphate decarboxylase active site. 

CONSENSUS: ELIVMFTAH-ELIVMFH-x-D-x-K-xCE^D-I-ICGPll-x-T- 
15 ELIVMTAJ. 

NAME: Phosphoenolpyruvate carboxylase active site 1- 

CONSENSUS: EVTJ-x-T-A-H-P-T-EEiO-x (E>-R-EKRHID • 

20 NAME: Phosphoenolpyruvate carboxylase active site 2. 

CONSENSUS: EIVJ-M-ELIVMJ-G-Y-S-D-S-x-K-D-ESTAGH-G. 
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NAME: Phosphoenolpyruvate carboxykinase (GTP) signature. 
CONSENSUS: F-P-S-A-C-G-K-T-N . 

NAME: Phosphoenolpyruvate carboxykinase (ATP) signature. 
CONSENSUS: L-I-G-D-D-E-H-x-lil-x-EDE J-x-G-EIVl-x-N . 



NAME: Uroporphyrinogen decarboxylase signature 1. 
30 CONSENSUS: P-x-bl-x-M-R-fl-A-G-R . 

NAME: Uroporphyrinogen decarboxylase signature S- 

CONSENSUS: G-F-ESTAGCVl-ESTAGO-x-P-EFYUH-T-ELVI-xCS) -Y-xCE)- 

EAE3-EGO. 

35 

NAME: Indole-3-glycerol phosphate synthase signature- 
CONSENSUS: ELIVMFYl-ELIVMO-x-E-ELIVMFYCJ-K-EKRSPJ-ESTAO-S- 
P-EST3-x(3)-ELIVnFYST3. 

40 NAME: Ribulose bisphosphate carboxylase large chain active 
site- 

CONSENSUS: G-x-ONI-F-x-K-x-D-E- 

NAME: Fructose-bisphosphate aldolase class-I active site- 
45 CONSENSUS: ELIVMl-x-ELIVMFYtO-E-G-x-ELSJ-L-K-P-ESNID - 

NAME: Fructose-bisphosphate aldolase class-II signature 1. 
CONSENSUS: EFYVM3-xCl,3>-ELIVMHl-EAPN]]-ELIVM:il-x<:i,E)-ELIVM2- 
H-x-D-H-CGACfO. 



NAME : Fructose-bisphosphate aldolase class-II signature E • 

CONSENSUS: ELIVIHD-E-x-E-ELIVrU-G-x^-EGMI-EGSTAl-x-E. 



NAME: Malate synthase signature- 

55 CONSENSUS: EKR3-OEN(33-H-x(E)-G-L-N-x-G-x-lil-D-Y-[ELIVri3]-F • 

NAME: Hydroxymethylglutaryl-coenzyme A lyase active site- 

CONSENSUS: S-V-A-G-L-G-G-C-P-Y. 
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NAME: Hydroxymethylglutaryl-coenzyme A synthase active site. 

CONSENSUS: N-x-EDN3-EIV3-E-G-EIV3-D-x(2)-N-A-C-EFY3-x-G. 

5 NAME: Citrate synthase signature. 

CONSENSUS: G-EFYA3-EGA3-H-x-li:iV3-x(l,£>-l[RKT3-x(E)-D-li:PS3-R. 

NAME'- Alpha-isopropylmalate and homocitrate synthases 

signature !• 
10 CONSENSUS: L-R-OEJ-G-x-fl-xCltD-K. 

NAME: Alpha-isopropylmalate and homocitrate synthases 

signature 5. 

CONSENSUS: ELIVMFU3-x<2)-H-x-H-EDN3-D-x-G-x-EGAS3-x-EGASLI3. 
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NAME : KDPG and KHG aldolases active site- 

CONSENSUS: G-ELIVM3-x<3)-E-li:LIV3-T-|[LF3-R. 



NAME: KDPG and KHG aldolases Schiff-base forming residue* 
20 CONSENSUS: G-x(3)-ELIVMF3-K-ELF3-F-P-li:SA3-x(3)-G. 

NAME: Isocitrate lyase signature* 
CONSENSUS: K-EKR3-C-G-H-ELM(23 - 

25 NAME: Beta-eliminating lyases pyridoxal-phosphate attachment 
site • 

CONSENSUS: Y-x-D-x(3)-M-S-EGA3-K-K-D-x-ELIVM3CE)-x-ELIVM3-G- 
G- 

30 NAME: DNA photolyases class 1 signature 1. 

CONSENSUS: T-G-X-P-ELIV113 (E) -D-A-X-M-ERA3-X-ELIVM3 • 

NAME : DNA photolyases class 1 signature E- 

CONSENSUS: EDN3-R-x-R-ELIVM3(E)-x-ESTA3(2)-FHi:LIVMFA3-x-K-x- 
35 L-x(E-.3)-lil-EKR(J3. 

NAME: DNA photolyases class E signature 1- 
CONSENSUS: F-x-E-E-x-ILIVM3(E)-R-R-E-L-x(£)-N-F. 

40 NAME:. DNA photolyases class 3 signature E- 

CONSENSUS: G-x-H-D-x(2>-U-x-E-R-x-ELIVM3-F-G-K-ELIVM3-R-EFY3- 
M-N- 

NAME: Eukaryotic-type carbonic anhydrases signature. 
45 CONSENSUS: S-E-H-x-ELIVMJ-x(M)-CFYHJ-xC2)-E-ELIVM]]-H- 
ELIVMFA3(2) • 



NAME: Prokaryotic-type carbonic anhydrases signature 1. 
CONSENSUS: C-ESA3-D-S-R-ELIVM3-X-EAP3 • 

NAME: Prokaryotic-type carbonic anhydrases signature 2- 
CONSENSUS: EE(23-Y-A-ELIVM3-x(E)-ELIVM3-x(M)-ELIVMF3(3)-x-G-H- 
x(S)-C-G. 

55 NAME: Fumarate lyases signature- 
CONSENSUS: G-S-x<2>-n-x<2>-K-x-N. 

NAME: Aconitase family signature 1. 
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CONSENSUS: ELIVM3-X (2 ) -tGSACIVM3-x-CLIV3-Q:CTIV]l-CSTP]l-C- 

x(D-,l)-T-N-li:GSTANI3-x(-»)- 

CONSENSUS: ELIVNAJ. 

5 NAME: Aconitase family signature 5- 

CONSENSUS: G-x(2)-B:LIVIiIP<23-x(3)-II:GAC3-C-CGSTAM3-II:LIMPTA3-C- 
ELIMV3-EGA3. 

NAME: Dihydroxy-acid and b-phosphogluconate dehydratases 

10 signature 1- 

CONSENSUS: C-D-K-x (2) -P-EGA3-X (3) -EGA3 • 

NAME: Dihydroxy-acid and b-phosphogluconate dehydratases 

signature 2- 

15 CONSENSUS: CSAl-L-CLIVni-T-D-ICGAll-R-ICLIVIIFDl-S-CGAl-EGAVI- 
EST3- 

NAME: Dehydroquinase class I active site. 

CONSENSUS: D-ICLIVn3-EI>E3-0:LIVN3-x(ia-,2D)-CLIVf13(2)-x-ILSC]l- 
20 INHY3-H-CDN3. 

NAME: Dehydroquinase class II signature. 

CONSENSUS: ELIVn3-CNa3-G-P-N-ELVJ-x(2>-L-G-x-R-EflED3-P-x<2>- 
EFY3-G- 
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NAME: Enolase signature- 

CONSENSUS: ELIVK3>-K-x-N-fl-I-G-EST3-CLIV3-IST3-CDE3-ESTA:D. 



NAME: Serine/threonine dehydratases pyridoxal-phosphate 

30 attachment site- 

CONSENSUS: CDESH3-x(M-,S)-ICSTVG]l-x-ICAS3-0:FYI]l-K-IC])LIFSA3- 
ERVMF3-IEGA3-li:LIVMGA3. 

NAME: Enoyl-CoA hydratase/isomerase signature. 

35 CONSENSUS: CLIVM3-ESTA3-x-IELIVn3-OENflRHSTA3-G-x<3>-l[AG3(3)- 
x(M)-CLIVMST3-x-CCSTA3- 
CONSENSUS: O<2HP3-0:LIVMFY3 . 

NAME: Imidazoleglycerol-phosphate dehydratase signature 1- 

40 CONSENSUS: CLIVI1Y3-CDE3-x-H-H-x(2)-E-x(2)-CGCA3-CLIVt13- 
ESTAC3-CLIVM3. 



NAME: Imidazoleglycerol-phosphate dehydratase signature 2< 
CONSENSUS: G-x-EDN3-x-H-H-x(2)-E-CSTAGC3-x-EFY3-|!:. 

NAME: Tryptophan synthase alpha chain signature- 
CONSENSUS: CLIVM3-E-i:LIVM3-G-x(2)-CFYC3-CST3-Cl>E3-CPA3- 
ELIVMY3-EAGLI3-OE3-G - 

50 NAME: Tryptophan synthase beta chain pyridoxal-phosphate 

attachment site- 

CONSENSUS: CLIVM3-x-H-x-G-ESTA3-H-K-x-N . 

NAME: Delta-aminolevulinic acid dehydratase active site* 
55 CONSENSUS: G-x-I>-x-ELIVM3(2)-EIV3-K-P-EGSA3-x(2)-Y. 

NAME-' Urocanase active site- 
CONSENSUS: F-lJ-G-L-P-x-R-I-C-U - 
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NAME: Prephenate dehydratase signature !• 

CONSENSUS: EFY3-X-ELIVM3-X (2) -ELIVM3-X ( 5) -EDN3-X ( 5) -T-R-F- 

ELIVMU3-X-IELIVM3. 

5 

NAME: Prephenate dehydratase signature 2- 

CONSENSUS : ELIVri3-EST3-EKR3-ELIVM3-E-EST3-R-P . 

NAME: Dihydrodipicolinate synthetase signature 1. 

10 CONSENSUS : EGSA3-l[LIVM3-ELIVMFY3-x(2)-G-EST3-ETG3-G-E- 
CGASNF]|-x(Id)-EE(33. 

NAME: Dihydrodipicolinate synthetase signature 2- 

CONSENSUS: Y-EDNS3-IHLIVMF3-P-X (5) -IST3-X (3) -ELIVM3-X (13,m) - 

15 ELIVM3-X-ESGA3-ELIVI1F3- 

CONSENSUS: K-EDE(2AF3-ESTAC3 - 
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NAME: RsuA family of pseudouridine synthase signature. 

CONSENSUS: G-R-L-D-x(2)-CST3-x-G-ELIVMF3(M)-|[ST3-EDNT3. 

NAME: Cysteine synthase/cystathionine beta-synthase P- 

phosphate attachment site* 

CONSENSUS: K-x-E-x(3)-EPAJ-ISTAGC3-x-S-EIVAP3-K-x-R-x-ESTAG3- 
x(2)-IELIVM3. 

NAME : Phenylalanine and histidine ammonia-lyases signature- 

CONSENSUS: G-ESTG3-ELIVM3-|[STG3-«:AC3-S-G-OH3-L-x-P-L-li:SA3- 
xC2)-ESA3. 

30 NAME: Porphobilinogen deaminase cofactoi — binding site. 

CONSENSUS: E-R-x-ELIVMFA3-x(3)-ELIVMF3-x-G-EGSA3-C-x-|[IVT3-P- 
ELIVMF3-EGSA3. 

NAME : Cys/Met metabolism enzymes pyridoxal-phosphate 

35 attachment site- 

CONSENSUS: ED<23-ELIVMF3-x (3) -ESTAGC3-ESTAGCI3-T-K-IFYUfl3- 

ELIVMF3-x-G-EHfl3-ESGNH3 • 

NAME: Glyoxalase I signature 1* 

40 CONSENSUS: EH<23-EIVT3-x-0:LIVFY3-x-EIV3-x (S) -ESTA3-X (2) -F- 

EYM3-x(2-,3)-ICLnF3-G-ELnF]|. 

NAME: Glyoxalase I signature 2* 

CONSENSUS: G-ENTK(J3-x(D-,S)-EGA3-ELVFY3-EGH3-H-EIVF3-ECGA3-x- 
45 ESTAGL3-x(2)-ONC3- 

NAME: Cytochrome c and cl heme lyases signature 1. 

CONSENSUS: H-N-x(2)-N-E-x(2)-U-ENi3KR3-x(H)-li)-E. 

50 NAME: Cytochrome c and cl heme lyases signature 2- 

CONSENSUS: P-F-D-R-H-D-U- 



55 



NAME: Adenylate cyclases class-I signature 1- 
CONSENSUS: E-Y-F-G-IESA3 (2) -L-LJ-x-L-Y-K. 

NAME: Adenylate cyclases class-I signature 2. 
CONSENSUS: Y-R-N-X-W-ENS3-E-ELIVM3-R-T-L-H-F-X-G . 



-443- 



WO 01/98454 PCT/IB01/02050 
NAME: Guanylate cyclases signature- 

CONSENSUS : G-V-ELIVM3-x(0-,l)-G-x(5>-EFY3-x-ELIVM3-EFYIil3-EGS3- 

EDNTHKU3-EDNT3-EIV3- 

CONSENSUS: EDNTA3-X (5) -EDE3 . 

5 

NAME: Chorismate synthase signature 1- 

CONSENSUS: G-E-S-H-EGC3-x(2)-ELIVM3-EGTV3-x-ELIVM3<2)-El>E3-G- 
X-EPV3- 

10 NAME: Chorismate synthase signature 2- 

CONSENSUS: EGE3-R-ESA3 (S ) -ESAG3-R-IEV3-EST3-X (2) -ERH3-V-X (2 ) - 

G- 

NAME: Chorismate synthase signature 3- 

15 CONSENSUS: R-ESH3-D-EPSV3-ECSAVJ-X < l 4 ) -EGAI3-X-EIVGSP3-ELIVM3- 

x-E-ESTAH]l-ELIVM3. 



20 



NAME: b-pyruvoyl tetrahydropterin synthase signature 1- 

CONSENSUS: C-N-N-x(2>-G-H-G-H-N-Y- 

NAME: b-pyruvoyl tetrahydropterin synthase signature 2. 

CONSENSUS: D-H-K-N-L-D-x-D • 



NAME: Ferrochelatase signature- 

25 CONSENSUS: ELIVMF3(2>-x-S-x-H-EGS3-ELIVM3-P-xC4-.5)-EDENflKR3- 
x-G-D-x-Y- 

NAME: Alanine racemase pyridoxal-phosphate attachment site. 

CONSENSUS : V-X-K-A-EDN3-EGA3-Y-G-H-G . 

30 

NAME: Aspartate and glutamate racemases signature 1- 

CONSENSUS: EIVA3-ELIVM3-X-C-X (a,l)-N-EST3-EMSA3-ESTH3- 

ELIVFYSTANK3 - 

35 NAME: Aspartate and glutamate racemases signature 2- 

CONSENSUS: ELIVM3(2)-x-EAG3-C-T-EDEH3-ELIVMFY3-EPNGRS3-x- 
ELIVM3- 

NAME: Mandelate racemase / muconate lactonizing enzyme 

40 family signature 1- 

CONSENSUS: A-x-ESAG3(2)-ELIVM3-EDE3-x-A-x(2)-D-x(2)-EGAJ- 
EKR3- 

NAME: Mandelate racemase / muconate lactonizing enzyme 

45 family signature 2- 

CONSENSUS: G-x(?)-D-x( c l)-A-x(14)-ELIVM3-E-EDEN(33-P-x(M)- 
EDEN(23. 

NAME: Ribulose-phosphate 3-epimerase family signature 1- 

50 CONSENSUS: ELIVMF3-H-ELIVMFY3-D-ELIVM3-X-D-X (l-.2)-EFY3- 

ELIVM3-X-N-X-ESTAV3 • 

NAME: Ribulose-phosphate 3-epimerase family signature 2- 

CONSENSUS: ELIVMA3-x-ELIVM3-M-EST3-EVS3-x-P-x(3)-G-(2-x-F- 
55 x(b)-ENK3-ELIVMC3. 

NAME: Aldose 1-epimerase putative active site- 

CONSENSUS: ENS3-X-T-N-H-X-Y-EFU3-N-ELI3 • 
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NAME: Cyclophilin-type peptidyl-proly 1 cis-trans isomerase 

signature. 

CONSENSUS : EFY3-X C2)-ESTCNLV3-x-F-H-ERH3-CLIVMN3-li:LIVM3-x <2>- 

5 F-ELIVM3-x-t2-EAG3-G. 

NAME: Cyclophilin-type peptidyl-prolyl cis-trans isomerase 

profile • 

10 NAME: FKBP-type peptidyl-prolyl cis-trans isomerase 

signature 1. 

CONSENSUS : ELIVn , C3-x-EYF3-x-EGVL3-x(l-,2)-ELFT3-x(2)-G-x<3>- 
EDE3-ESTAE<20-ESTAN3- 

15 NAME: FKBP-type peptidyl-prolyl cis-trans isomerase 

signature 2- 

CONSENSUS : ELIVMFY3-x(2)-EGA3-x(3- 1 M)-ELIVMF3-x(2)-ELIVMFHK3- 
x(2)-G-x(»0-ELIVMF3- 

CONSENSUS: x(3)-EPSGA<23-x(2)-EAG3-EFY3-G. 



20 



35 



NAME: FKBP-type peptidyl-prolyl cis-trans isomerase domain 

profile . 



NAME : PpiC-type peptidyl-prolyl cis-trans isomerase 

25 signature- 

CONSENSUS: F-EGSAI>EI3-x-ELVA(23-A-x<3>-EST3-x(3-.H>-EST<23- 

x(3 n 5)-CGER3-G-x-ELIVN3- 

CONSENSUS: EGS3 • 

30 NAME: Triosephosphate isomerase active site- 

CONSENSUS: EAV3-Y-E-P-ELIVM3-U-ESA3-I-G-T-EGK3 . 



NAME: Xylose isomerase signature !• 
CONSENSUS: ELI3-E-P-K-P-X (2) -P . 

NAME: Xylose isomerase signature 2- 
CONSENSUS: EFL3-H-D-X-D-ELIV3-X-EPD3-X-EGDE3 - 



NAME: Phosphomannose isomerase type I signature !• 

40 CONSENSUS: Y-x-D-x-N-H-K-P-E - 

NAME: Phosphomannose isomerase type I signature 2- 

CONSENSUS: H-A-Y-ELIVM3-x-G-x(2)-ELIVM3-E-x-M-A-x-S-D-N-x- 
CLIVM3-R-A-G-X-T-P-K- 

45 

NAME: Phosphoglucose isomerase signature 1- 

CONSENSUS: EDENS3-X-ELIVM3-G-G-R-EFY3-S-ELIVMT3-X-ESTA3- 

EPSAC3-ELIVMA3-G- 

50 NAME: Phosphoglucose isomerase signature 2. 

CONSENSUS: EGS3-x-ELIVM3-ELIVMFYIil3-x (M ) -EFY3-EDN3-(2-x-G-V-E- 

x(2)-K. 

NAME: Glucosamine/galactosamine-b-phosphate isomerases 

55 signature. 

CONSENSUS: ELIVM3-X (3) -G-x-ELIT3-x-ELIV3-x-ELIVM3-x-G-ELIVM3- 

G-X-EDEN3-G-H - 
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NAME: Phosphoglycerate mutase family phosphohistidine 

signature • 

CONSENSUS: ELIVIU-x-R-H-G-EEcO-x (3) -N - 

5 NAME: Phosphoglucomutase and phosphomannomutase 

phosphoserine signature. 

CONSENSUS: EGSAl-ELIVMID-x-ELIVMll-ESTl-EPGAll-S-H-x-P-xCin- 
EGNHE3- 

10 NAME: Methylmalonyl-CoA mutase signature* 

CONSENSUS: R-I-A-R-N-ET<2]l-xC2>-ELIVMFY3<2)-x-EEi2:i-E-x(4)- 

EKRN3-x(2)-]>-P-x-EGSA]l- 

CONSENSUS: G-S- 

15 NAME: Terpene synthases signature. 

CONSENSUS: OEJ-G-S-W-x-G-x-U-EGAl-ELIVrO-x-EFYJ-x-Y-EGA]! . 

NAME: Eukaryotic DNA topoisomerase I active site- 
CONSENSUS: EDEN3-x(b)-EGS]l-EIT3-S-K-x(B)-Y-CLIVI1]l-x(3)- 
20 - 

NAME: Prokaryotic DNA topoisomerase I active site. 
CONSENSUS: CEfl3-x-L-Y-C»EflTl-x(3-il2)-CLIl-CSTl-Y-x-R-CSTI- 



25 



EDEflSH- 

NAME: DNA topoisomerase II signature- 

CONSENSUS : ELIVnAH-x-E-G-EDNH-S-A-x-ESTAGH ■ 



NAME: Aminoacyl-transf er RNA synthetases class-I signature- 
30 CONSENSUS: P-x(0iB)-EGSTAN3-EDEN(3GAP<]l-x-ELIVMFP]l-EHT3- 
ELIVMYAC3-G-EHNTG3- 
CONSENSUS: ELIVHFYSTAGPC3 • 

NAME : Aminoacyl-transf er RNA synthetases class-II signature 
35 1- 

CONSENSUS: EFYH3-R-x-IDE3-x(4-,:i.2>-ERH3-x(3)-F-xC3>-EDE3. 

NAME: Aminoacyl-transf er RNA synthetases class-II signature 

2. 

40 CONSENSUS: EGSTALVF3-CI>EN(2HRKP}-EGSTA3-ELIVMF3-E]>E3-R- 
ELIVMF3-X-ELIVMSTAG3-ELIVMFY3 . 

NAME: UHEP-TRS domain signature. 

CONSENSUS: E<3Y3-G-EDNEA3-x-ELIV3-EKR3-x(2>-K-x(2>-EKRNG3- 
45 EAS3-x(4)-ELIV3-El>ENK3- 

C0NSENSUS: x(2)-EIV3-x(2)-L-x(3>-K. 

NAME: ATP-citrate lyase / succinyl-CoA ligases family 

signature 1- 

50 CONSENSUS: S-EKRJ-S-G-EGT3-ELIVI13-EGST3-x-EE<23-x<fl,:LD)-G- 
x(4)-ELIVM3-EGA3-ELIVM3-G- 
CONSENSUS: G-D - 

NAME: ATP-citrate lyase / succinyl-CoA ligases family active 

55 site. 

CONSENSUS: G-x(2)-A-x(4-,7>-ERflT3-ELIVMF3-G-H-EAS3-CGH3. 
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NAME: ATP-citrate lyase / succinyl-CoA ligases family 

signature 3. 

CONSENSUS: g-x-iivj-xce>-elivhf3-x-cnaj-g-egaj-g-ela:j-estav:d- 

x < i|) -D-x-ELIVMU-x < 3 ) - 
CONSENSUS: G-EGRE3 • 

NAME: Glutamine synthetase signature 1* 

CONSENSUS: EFYUL3-D-G-S-S-X ( Lifl ) -EDEN(2STAK3-ESA3-EDE3-x (2) - 

ILIVMFY3 • 

NAME: Glutamine synthetase putative ATP-binding region 
signature. 

CONSENSUS: K-P-IELIVMFYA3-x(3-.S)-B:NPAT3-G-EGSTAN3-G-x-H-x<3>- 
S. 

NAME: Glutamine synthetase class-I adenylation site. 
CONSENSUS: K-ELIVM3-x(5)-ELIVMA3-D-Q:RO-EDN3-ELI3-Y. 



NAME : D-alanine — D-alanine ligase signature 1. 
20 CONSENSUS: H-G-x(2)-G-E-D-G-x-ELIVMA3-E<JSA3-EGSA3. 

NAME: D-alanine — D-alanine ligase signature 2- 
CONSENSUS: ELIV3-X ( 3 ) -EGA3-X-EGSAIV3-R-ELIVCA3-DHELIVMF3 (2 ) - 

x(7iT)-ELI3-x-E- 
25 CONSENSUS: CLIVA3-N-ESTP3-X-P-EGA3 • 

NAME: SAICAR synthetase signature 1- 

CONSENSUS: ELIVMF3(2)-P-ELIVM3-E-x-ELIVM3-ELIVMCA]l-R-x(3)- 
ETA3-G-S. 



NAME: SAICAR synthetase signature 2- 

CONSENSUS: ILIVM3-ELIVMA3-D-X-K-ELIVMFY3-E-F-G . 



NAME: Folylpolyglutamate synthase signature 1- 

35 CONSENSUS: CLIVMFY3-x-ELIVM3-ESTAG3-G-T-ENK3-G-K-x-0:ST3-x<7)- 
ILIVM3(2)-x(3)-IGSKl. 

NAME: Folylpolyglutamate synthase signature 2- 

CONSENSUS: ELIVMFY3(E)-E-x-G-ELIVM3-EGA3-G-x(2)-D-x-EGST3-x- 
40 ELIVM3 (2) • 

NAME: Ubiquitin-activating enzyme signature 1. 

CONSENSUS: K-A-C-S-G-K-F-x-P . 

45 NAME: Ubiquitin-activating enzyme active site. 

CONSENSUS: P-ELIVM3-C-T-ELIVM3-EKRH3-X-EFT3-P - 

NAME: Ubiquitin-con jugating enzymes active site. 

CONSENSUS: EFYULSP3-H-EPC1-ENH3I-ELIV3-X (3-.M)-G-x-ELIV3-C- 

50 ELIVJ-x-ILIVJ. 

NAME: Formate — tetrahydrof olate ligase signature 1. 

CONSENSUS: G-CLIVM3-K-G-G-A-A-G-G-G-Y . 

55 NAME: Formate--tetrahydrof olate ligase signature 2. 

CONSENSUS: V-A-T-EIV J-R-A-L-K-x-IHPO-G-G - 

NAME: Adenylosuccinate synthetase GTP-binding site. 
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NAME: Adenylosuccinate synthetase active site- 
CONSENSUS: G-I-EGR]]-P-x-Y-x(2)-K-x(2)-R. 

5 

NAME: Argininosuccinate synthase signature 1. 
CONSENSUS: A-EFYJ-S-G-G-L-D-T-S • 

NAME: Argininosuccinate synthase signature 2. 
10 CONSENSUS: G-x-T-x-K-G-N-D-x (2 ) -R-F • 

NAME: Phosphoribosylglycinamide synthetase signature- 
CONSENSUS : R-F-G-D-P-E-x-E(2MJ - 

15 NAME: Carbamoyl-phosphate synthase subdomain signature 1. 
CONSENSUS: EFYVI-EPSl-ELIVMCJ-ELIVMAl-ELIVMJ-EKRJ-EPSAID- 
ESTAJ-x C 3) -ESGll-G-x-EAGID • 

NAME: Carbamoyl-phosphate synthase subdomain signature 2- 
20 CONSENSUS: ELIVMFJ-ELIMNJ-E-ELIVMCAl-N-EPATLIVMll-EKRni- 
ELIVMSTAO. 



25 



30 



35 



45 



NAME: ATP-dependent DNA ligase AMP-binding site. 

CONSENSUS: EEDflHl-x-K-x-EDNJ-G-x-R-EGACIVMH. 

NAME: ATP-dependent DNA ligase signature 2. 

CONSENSUS: E-G-ELIVMA3-ELIVM1 (2)-EKR]l-x(5-.fl>-EYIiO-E<aNEO- 

x(2-.b)-EKRH3-x(3-,S)-K- 

CONSENSUS: ELIVMFY2-K • 

NAME : NAD-dependent DNA ligase signature 1- 

CONSENSUS: K-ELIVM3-D-G-ELIVM:B-ESAJ-xCiO-Y-xC2)-G-x-L-xCin- 

EST3-R-G-EDN3-G-x(2)-G- 

CONSENSUS: EDE3-EDENL3 . 

NAME: NAD-dependent DNA ligase signature 2. 

CONSENSUS: EIV3-G-EKR]l-EST3-G-x-ELIVM3-ESTNK]l-x-EVT]l-x(2)-L- 

x-EPSJ-V. 



40 NAME: RNA 3 f -terminal phosphate cyclase signature. 

CONSENSUS: ERH3-G-x(2)-P-x-G(3)-x-ELIV]|. 



NAME: Lipoate-protein ligase B signature. 

CONSENSUS: R-G-G-x(2)-T-EFYU3-H-x(2)-EGH]l-fl-x-ELIV]|-x-Y. 

NAME: Isopenicillin N synthetase signature 1. 
CONSENSUS: ERKI-x-ESTAJ-x ( 2) -S-X-C-Y-ESL3 . 



NAME: Isopenicillin N synthetase signature 2. 
50 CONSENSUS: ELIVMl(2)-x-C-G-ESTA3-x(2)-ESTAG3-x(2)-T-x-EDNG3. 

NAME: Site-specific recombinases active site- 
CONSENSUS: Y-ELIVAC3-R-EVA3-S-EST]l-x(2)-i3. 

55 NAME: Site-specific recombinases signature 2- 

CONSENSUS : G-EDE3-x(2)-ELIVM3-x(3)-ELIVM3-EDT]l-R-ELIVM3- 
EGSAJ. 
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NAME: Transposases-i Mutator family-i signature-' 

CONSENSUS: D-x (3)-G-ELIVMF3-x ( b)-CSTAV3-CLIVMFYIiJ3-EPT3-x- 

ESTAV3-x(2)-E<2R3-x-C-x(2)- 
CONSENSUS: H- 

5 

NAME: Transposasesi IS3D family-i signature. 

CONSENSUS: R-G-x<2)-E-N-x-N-G-ELIVM3(2)-R-Ei2E3-ELIVMFY3(2)-P- 
K . 

10 NAME: Autoinducers synthetases family signature. 

CONSENSUS : ILMFY3-R-x(3)-F-x<2)-IKR3-x(2)-liJ-x-li:LIVM3-x<b-. , l)- 
E-x-D-x-tFYJ-D. 

NAME: Thiamine pyrophosphate enzymes signature. 

15 CONSENSUS: ELIVMF3-ESSA3-x(S>-P-x(M)-ELIVMFYlil3-x-ELIVMF3-x-G- 
D-EGSA3-EGSAC3- 

NAME: Biotin-requiring enzymes attachment site- 

CONSENSUS: EGN3-EDEi2TR3-x-ELIVMFY3-x(2)-ILIVM3-x-EA:i:V3-M-K:- 
20 ELMAT3-x<3>-li:LIVM3-x- 
CONSENSUS: CSAV3- 

NAME: 2-oxo acid dehydrogenases acyltransf erase component 

lipoyl binding site- 
25 CONSENSUS: ~ EGN3-x<2>-ELIVF3-x(5)-ELIVFC3-x(2)-li:LIVFA3-x(3)-K- 
ESTAIV3-ESTAVI3DNJ- 

CONSENSUS: x(2)-ELIVMFS3-xC5)-EGCN3-x-ELIVMFY3. 

NAME: Putative AMP-binding domain signature- 

30 CONSENSUS: ELIVMFY3-x<2)-ESTG3-ESTAG3-G-IST3-ESTEI3-IESG3-x- 
EPASLIVrO-IKRJ. 
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NAME: Molybdenum cofactor biosynthesis proteins signature 1- 
CONSENSUS: ELIVM3(3)-ELIT3<2)-G-G-T-G-x(i4>-D. 

NAME: Molybdenum cofactor biosynthesis proteins signature 2- 
CONSENSUS: S-x-EGS3-x(2)-»-x(5)-ELIVU3-x(lD-,12)-ELIV3-x(2)- 
EKRl-P-G-EKRL3-P-x(2)- 
CONSENSUS: ILIVMF3-EGA3 - 

NAME: moaA / nifB / pqqE family signature. 

CONSENSUS: ELIV3-x<3)-C-ENP3-ELIVMF3-E(2RS3-C-x-EFYM3-C. 



NAME: Radical activating enzymes signature- 

45 CONSENSUS: EGV3-x-G-x-EKR3-x (3)-F-x <2>-G-x (CM )-C-x (3) -C- 

x(2)-C-x-ENL3- 



NAME: Tpx family signature- 

CONSENSUS: S-x-D-L-P-F-A-x(2)-IKR3-£FU3-C- 

NAME: Cytochrome c family heme-binding site signature. 
CONSENSUS: C-CCPUHF}- CCPURD-C-H-CCFYli)} - 



NAME: Cytochrome bS familyi heme-binding domain signature. 

55 CONSENSUS: EFY3-ELIVMK3-X (2) -H-P-CGA3-G- 

NAME: Cytochrome b/bb heme-ligand signature- 

CONSENSUS : EDEN(33-x ( 3) -G-EFYUM(23-x-ELIVMF3-R-x (2)-H • 
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NAME: Cytochrome b/bt do site signature. 

CONSENSUS: P-OE3-U-IEFY3-JCLFY3 ( 2 ) • 

5 NAME: Cytochrome bSST subunits heme-binding site signature- 

CONSENSUS: CLIVl-x-CSTl-ILIVFl-R-CFYUl-x (2)-CIV3-H-IESTGA3- 

ILIV]I-CSTGA3-I[IV]I-P. 

NAME : Nickel-dependent hydrogenases b-type cytochrome 

10 subunit signature 1. 

CONSENSUS: R-IELIVMFYU3-x-H-U-ELIVn3-x<2)-ILIVMF3-[ESTAC3- 
ELIVM3-x(2)-L-x-ELIVrO-T-G. 

NAME: Nickel-dependent hydrogenases b-type cytochrome 

15 subunit signature 2- 

CONSENSUS: ERHJ-ESTAH-ELIVUFYUH-H-ERHH-ELIVrO-x (2) -til-x- 

ILIVHF3-x(2)-F-x<3)-H. 

NAME : Succinate dehydrogenase cytochrome b subunit signature 

20 I- 

CONSENSUS: R-P-ELIVMT3-x(3>-ELIVri3-x(b>-l[LIVt1li)PK3-x(t»)-S- 
x(2)-H-R-x-CST3- 

NAME : Succinate dehydrogenase cytochrome b subunit signature 

25 2- 

CONSENSUS: H-x(3)-EGA3-CLIVnT3-R-EHF3-ELIVHF3-x-l[FYUI13-]>-x- 
EGVA3. 

NAME: Thioredoxin family active site. 

30 CONSENSUS: ELIVMF3-CLIVnSTA3-x-ELIVriFYC3-EFYlilSTHE3-x(2>- 
EFYUGTN3-C-CGATPLVE3- 

CONSENSUS: CPHYUSTA3-C-x(b)-CLIVnFYIi)T3. 

NAME: Glutaredoxin active site- 

35 CONSENSUS: CLIVD3-ICFYSA3-xC»O-C-0:PV3-0:FYIiO-C-x(2>-II:TAV3- 
x(2-.3)-CLIV3- 

NAME: Type-1 copper (blue) proteins signature- 

CONSENSUS: CGA3-x(0-.2)-D:YSA3-x(0-.1)-Q:VFY3-x-C-x(1-.2)-Q:PG3- 
40 x(D-.l)-H-x(2-.M)-CH(23. 

NAME: 2Fe-2S f erredoxins-. iron-sulfur binding region 
signature. 

CONSENSUS : C-{C}-{C}-CGA3--CC>-C-EGAST3--CCPDEKRHFYIi)}-C • 

45 

NAME: Adrenodoxin family-i iron-sulfur binding region 
signature. 

CONSENSUS: C-x(2)-CSTA(J3-x-l[STAf1V3-C-ISTA3-T-C-CHR3. 

50 NAME: MFe-MS f erredoxins-. iron-sulfur binding region 

signature. 

CONSENSUS: C-x(2)-C-x(2)-C-x(3)-C-CPEG3- 

NAME : High potential iron-sulfur proteins signature- 

55 CONSENSUS: C-x(b-,T)-ILIVH3-x(3)-G-EYU3-C-x(2)-0:FYU3. 

NAME: Rieske iron-sulfur protein signature 1- 

CONSENSUS : C-CTK3-H-L-G-C-ELIVT3 - 
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NAME: Rieske iron-sulfur protein signature 2- 

CONSENSUS: C-P-C-H-X-EGSA3 - 

5 NAME: Flavodoxin signature- 

CONSENSUS: ELIV3-ELIVFY3-EFY3-X-EST3-X (2) -IAGC3-x-T-x ( 3 ) - A- 

x(2)-ELIV3- 

NAME: Rubredoxin signature- 

10 CONSENSUS: ELIVM3-x<3>-ld-x-C-P-x-C-EAGD3. 

NAME: Electron transfer flavoprotein alpha-subunit 

signature. 

CONSENSUS: ELI3-Y-CLIVM3-EATJ-x-G-EIV3-ES»3-G-x-IIV3-(2-H- 
15 x(2)-G-x(b)-EIV3-x-A- 
CONSENSUS: EIV3-N- 

NAME: Electron transfer flavoprotein beta-subunit signature. 

CONSENSUS: EIVA3-x-EKR3-x(2>-El>E3-EGD3-EGI>E3-xa-.2)-EEc23-x- 
20 CLIV3-x(iO-P-x-ELIVI13(2>- 
CONSENSUS: ETAC3 • 
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NAME: Vertebrate metallothioneins signature- 

CONSENSUS: C-x-C-CGSTAP3-xC2)-C-x-C-x<2)-C-x-C-x<2>-C-x-K- 

NAME: Ferritin iron-binding regions signature 1- 

CONSENSUS: E-x-li:KR3-E-x(2>-E-EKR3-ELF3-ELIVMA3-x(2>-i3-N-x-R- 

x-G-R. 



30 NAME: Ferritin iron-binding regions signature 2- 

CONSENSUS: D-x (2) -ELIVMF3-ESTAC3-EDH3-F-ELI3-EEN3-X (2) -EFY3- 
L-xCL)-ELIVM3-EKN3. 

NAME: Bacteriof erritin signature- 

35 CONSENSUS: <M-x-G-x(3)-V-ELIV3-x(2)-ELM3-x(3)-L-x(3)-L- 

NAME: Transferrins signature 1- 

CONSENSUS: Y-x (D-.1)-EVAS3-V-EIVAC3-CIVA3-EIVA3-CR<H3-ERKS3- 
CGDENSA3- 

40 

NAME: Transferrins signature 2- 

CONSENSUS: Y-x-G-A-EFL3-CKRHNl33-C-L-x (3-,M ) -G-OENO-V-EGA3- 
EFYW3- 

45 NAME: Transferrins signature 3- 

CONSENSUS: EDEN(23-EYF3-x-ELY3-L-C-x-EI>N3-x (S-.fi) -ILIV3-x(i*-,5) - 

C-x(2)-A-xC4)-EH(2R3-x- 

CONSENSUS: ELIVMFYIil3-ELIVM3 - 

50 NAME: Globins profile- 

NAME: Protozoan/cyanobacterial globins signature- 

CONSENSUS: F-ILF3-x(S)-G-EPA3-x(M)-G-EKRA3-x-ELIVM3-x(3)-H. 

55 NAME: Plant hemoglobins signature- 

CONSENSUS: ESN3-P-x-L-x(2)-H-A-x(3)-F- 

NAME: Hemerythrins signature- 
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CONSENSUS: U-L-x-CNfiJ-H-I-x (3 ) -D-F • 

NAME ■ Arthropod hemocyanins / insect LSPs signature 1- 

CONSENSUS : Y-EFYhO-x-E-D-IELIVMJ-x (2) -N-x (b)-H-x (3) -P. 

5 

NAME: Arthropod hemocyanins / insect LSPs signature 2- 

CONSENSUS: T-x (2) -R-D-P-x-EFYl-EFYliU - 

NAME: Heavy-raetal-associated domain- 

10 CONSENSUS: ELIVNJ-x ( 2) -ELIVMFAJ-x-C-x-ESTAGCDNHH-C-x (3) - 

CLIVFG]1-x(3)-D:LIV3-x( c 1-.11)- 
CONSENSUS: EIVAJ-x-ELVFYSJ - 

NAME: ABC transporters family signature. 

15 CONSENSUS: ELIVMFYO-ESAH-ESAPGLVFYKflHl-G-EDENflMli)]]- 
EKRflASPCLIMFlU-EKRNflSTAVMl- 

CONSENSUS: EKRACLVMl-ELIVMFYPANJ-HIPHY^ILIVMFIilll-CSAGCLIVPJ- 

■CFYUHPl— CKRHPJ- 

CONSENSUS: ELIVMFYWSTAJ • 

20 

NAME: Binding-protein-dependent transport systems inner 

membrane comp ■ sign* 

CONSENSUS: ELIVMFY3-x(fl)-EE(JR3-ESTAGV3-ESTAG]l-x(3)-G- 
ELIVMFYSTAC3-x(S)-ELIVI1FYSTAl- 
25 CONSENSUS: x(M > -ELIVMFYJ-EPKRJ - 

NAME: ABC-2 type transport system integral membrane proteins 

signature. 

CONSENSUS: ELIMST]l-x(2)-ELIMli)]l-x(2)-ELIMCA3-EGSTC3-x-EGSAIV3- 
30 x(b)-ELIMGA3-EPGSN(31- 

CONSENSUS: x(T-,12)-P-ELIMFT3-x-EHRSY3-x(S)-ER(23. 

NAME: Bacterial extracellular solute-binding proteinsi 

family 1 signature- 
35 CONSENSUS: " EGAP3-ELIVMFA3-ESTAVI>N3-x(M)-EGSAV3-ELIVMFY3(2)-Y- 
ENI)3-x(3)-ELIVMF3-x- 
CONSENSUS: EKNDE1 • 

NAME: Bacterial extracellular solute-binding proteinsi 

,40 family 3 signature- 

CONSENSUS: ~ G-EFYIL3-EI>E3-ELIVMT3-E]>E3-ELIVMF3-x(3)-ELIVMA3- 
EVAGC3-x(2)-ELIVMAGN1. 

NAME: Bacterial extracellular solute-binding proteinsi 

45 family S signature- 

CONSENSUS: EAG3-x<bi7>-EDNEG]l-x(2)-ESTAVE]]-ELIVMFYIilA]l-x- 
ELIVMFY3-X-ELIVM3-EKR3- 

CONSENSUS: EKRHDEJ-EGDNa-ELIVMAJ-EKNGSPJ-EFliD- 

50 NAME: Serum albumin family signature. 

CONSENSUS: EFY3-x(t)-C-C-x(7)-C-ELFY3-x(b)-ELIVMFYIil3. 



55 



NAME: Transthyretin signature 1- 
CONSENSUS: S-K-C-P-L-M-V-K-V-L-D-EAS3-V-R-G - 

NAME: Transthyretin signature 2- 

CONSENSUS: S-P-EFY3-S-IFY3-S-T-T-A-ELIVM3-V-EST3-X-P - 
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NAME: Avidin / Streptavidin family signature- 

CONSENSUS: EDENJ-x (2) -EKRJ-ESTAJ-x (2 ) -V-G-x-EDNJ-x-EFUID-T- 

EKR3- 

5 NAME: Eukaryotic cobalamin-binding proteins signature. 

CONSENSUS: ESNl-V-D-T-EGAJ-A-ELIVMJ-A-x-L-A-ELIVMFJ-T-C- 

NAME: Lipocalin signature- 

CONSENSUS: EDENGJ-x-OENflGSTARO-x (OiE) -OENCJARO-ELIVFYJ- 

IO •CCP>-G-{C>-U-EFYIi)LRH3-x- 
CONSENSUS: ELIVMTA3 - 



NAHE: Cytosolic fatty-acid binding proteins signature- 

CONSENSUS: EGSAIVO-x-EFYWl-x-ELIVMFil-x ( 4 ) -ENHGl-EFYl-EDEJ-x- 

15 ELIVI1FY3-ELIVI1]l-x(B)- 

CONSENSUS: ELIVMAKRJ. 



NAME: Acyl-CoA-binding protein signature- 

CONSENSUS: P-ESTAl-x-EDEND-x-ELIVMFH-x (E>-ELIVMFY]]-Y-EGSTA:D- 

20 x-EFYl-K-(2-ESTA3(E)-x-G. 

NAME: LBP / BPI / CETP family signature- 

CONSENSUS: EPA3-EGA3-ELIVMC3-X (E ) -R-EIVJ-ESTH-x ( 3 ) -L-x ( 5 ) - 

EEiO-x ( M ) -ELIVMJ-EEflO- 
25 CONSENSUS: x(fi)-P- 



NAME: Phosphatidylethanolamine-binding protein family 

signature. 

CONSENSUS: CFY3-x-ELIVnF3(3)-x-El)C3-P-I>-x-P-ESNl-x(lD)-H. 

NAME: Plant lipid transfer proteins signature- 

CONSENSUS : ELIVM3-EPA3-x(E)-C-x-ELIVM]I-x-ELIVM3-x-ELIVnFY3-x- 

ELIVM3-EST]]-x(3)- 

CONSENSUS: EDNl-C-x (E) -ELIVM1 - 

NAME: Uteroglobin family signature 1- 

CONSENSUS: EGAl-x ( 3 ) -I-C-P-x-ELIVMFJ-x ( 3) -ELIVMJ-EDEJ-x- 

ELIVMF3 (E) - 



40 NAME: Uteroglobin family signature E- 

CONSENSUS: EDE<0-x ( 4 ) -ESND-x ( S) -EDEflJ-x-I-x (E ) -S-EPSE3-ELSJ- 

C 



NAME: Mitochondrial energy transfer proteins signature- 
45 CONSENSUS: P-x-EDEJ-x-ELIVATH-ERKl-x-ELRHJ-ELIVMFYJ-EflMAIGVll - 

NAME: Sugar transport proteins signature 1- 
CONSENSUS: ELIVnSTAG3-ELIVMFSAGJ-x(E)-ELIVMSA3-EDE3-x- 
ELIVMFYWAI-G-R-ERO-xCMib)- 
50 CONSENSUS: E6STAJ- 

NAME: Sugar transport proteins signature E- 

CONSENSUS: ELIVMF]l-x-G-ELIVMFA]l-x(E)-G-x(fl)-ELIFY]l-x(E)-EE(3]|- 
x(b)-ERKl- 

55 

NAME: LacY family proton/sugar symporters signature 1- 
CONSENSUS: G-CLIVMJ(E)-x-D-ERO-L-G-L-ERO<E)-x-ELIVMJ(S>-|j). 



-453- 



WO 01/98454 PCT/IB01/02050 

NAME: LacY family proton/sugar symporters signature 2- 

CONSENSUS: P-X-ELIVMF3 (2) -N-R-ELIVM3-G-x-K-N-ESTA3-|[LIVM3 (3) - 

NAME: PTR2 family proton/ol igopeptide symporters signature 

5 1. 

CONSENSUS: EGA3-EGAS3-ELIVMFYIi)A3-li:LIVM3-EGAS3-]>-x-li:LIVMFYUT3- 
ELIVMFYW3-G-x(3)-Q:TAV3- 

CONSENSUS: EIV3-X (3) -EGSTAVJ-x-ELIVIIFJ-x (3 )-EGA3- 

10 NAME: PTR2 family proton/oligopeptide symporters signature 

2- 

CONSENSUS : EFYT3-x(2)-ELMFYJ-EFYV3-ELIVriFYUA3i-x-EIVGJ-N- 
ELIVMAG3-G-ICGSA3-ELIMF3. 

15 NAME: Amiloride-sensitive sodium channels signature- 

CONSENSUS : Y-x<2)-H:E(2TF3-x-C-x<2)-EGSTl>NL3-C-x-Ei2T3-x(2>- 
ELIVMT3-ILIVMS3-X ( 2)-C-x-C • 

NAME: Sodium:alanine symporter family signature- 

20 CONSENSUS: G-G-x-IGA3<2)-li:i.IVM3-F-lil-M-U-ELIVM3-x-ESTAV3- 
ELIVMFA3(2)-G- 

NAME: Sodium: dicarboxylate symporter family signature 1- 

CONSENSUS: P-x (0i D-G- EDE3-X-ELIVMF3 (2) -X-ELIVM3 (2) -EKRE«23- 

25 ELIVM3<3)-x-P- 

NAME: Sodium: dicarboxylate symporter family signature 2- 

CONSENSUS: P-x-G-x-ESTA3-x-ENT3-ELIVMC3-D-G-ESTAN3-x-fl:LIVM3- 
EFY3-x<2)-ELIVM3-x(2>- 
30 CONSENSUS: ELIVN3-EFY3-ELI3-ESA3-I3 - 

NAME: Sodium : galactoside symporter family signature- 

CONSENSUS: D-x(3)-6-x(3)-EDN3-x(b-.fl)-G-CKH3-F-EKR3-P-EFYIil3- 
ELIVM3(2)-x-EGSTA3(2) - 



35 



NAME: Sodium:neurotransmitter symporter family signature 1 

CONSENSUS: U-R-F-EGP3-Y-X CM ) -N-G-G-G-X-EFY3 . 



NAME: Sodium:neurotransmitter symporter family signature 2. 

40 CONSENSUS: Y-ELIVMFY3-x(2)-ESC3-ELIVMFY3-EST(J3-x(2)-L-P-IJ- 
x(2)-C-x(H)-N-EGST3- 

NAME: Sodium: solute symporter family signature 1- 

CONSENSUS: EGS3-x(2)-ELIY3-x(3)-ELIVMFYUSTAG3(lO)-0:LIY3- 
45 ETAV3-x(B)-G-G-ELMF3-x- 
C0NSENSUS: ESAP3- 

NAME: Sodium:solute symporter family signature E- 

CONSENSUS: EGAST3-ELIVM3-x(3)-EKR3-x(M)-G-A-x(2)-EGAS3- 
50 ELIVMGS3-ELIVMIO-ELIVMGAT3-G- 
CONSENSUS: X-ILIVMG3 - 

NAME: Sodium:sulfate symporter family signature- 

CONSENSUS: ESTACP3-S-X (2) -F-x (2) -P-ELIVM3-IGSA3-X (3)-N-x- 

55 ELIVM3-V- 

NAME: glpT family of transporters signature. 

CONSENSUS: R-G-x ( S) -W-N-x (2) -H-N-x-G-G - 
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NAME: Ammonium transporters signature- 

CONSENSUS : D-EFYUS3-A-G-EGSC3-X (2)-EIV3-x (3) -ESAG3 (2 ) -x (2) - 

ESAG3-ELIVMF3-x<3)- 
5 CONSENSUS: ELIVMFYUA3 C2)-x-EGK3-x-R. 

NAME: BCCT family of transporters signature* 

CONSENSUS: EGSDN3-U-T-ELIVM3-x-EFY3-li)-x-U-U • 

10 NAME: Flagellar motor protein motA family signature. 

CONSENSUS : A-ELMF3-x-EGAT3-T-ELIVF3-x-G-x-ELIVMF3-x(7)-P- 

NAME: Formate and nitrite transporters signature 1- 
CONSENSUS: ELIVMA3-ELIVMY3-X-G-EGSTA3-EDES3-L-EFI3-ETN3-EGS3 - 

15 

NAME: Formate and nitrite transporters signature 2- 
CONSENSUS : EGA3-x(2)-ECA3-N-ELIVMFYl 1 l3(2>-V-C-ELV3-A. 

NAME: Prokaryotic sulf ate-binding proteins signature 1- 

20 CONSENSUS: K-x-EN<2EK3-EGT3-G-ED<23-x-ELIVM3-x(3)-(2-S. 

NAME: Prokaryotic sulf ate-binding proteins signature 2. 

CONSENSUS: N-P-K-EST3-S-G-X-A-R . 

25 NAME: Sulfate transporters signature- 

CONSENSUS: P-X-Y-EGS3-L-Y-ESTAG3 (2) -x (M ) -ELIVMFY3(3)-x (3)- 

EGSTA3(2)-S-EKR3- 

NAME: Amino acid permeases signature- 

30 CONSENSUS: ESTAGC3-G-EPAG3-X (2 ,3)-ELIVMFYbJA3 (2 ) -X-ELIVMFYU3- 

X-ELIVMFUSTAGC3 (2)- 

CONSENSUS: ESTAGC3-X C 3) -ELIVMFYU3-X-ELIVMST3-X (3 ) -ELIMCTA3- 

EGA3-E-x(5)-EPSAL3- 

35 NAME: Aromatic amino acids permeases signature- 

CONSENSUS : I-G-EGA3-G-M-ELF3-ESA3-x-P-x<3)-ESA3-G-x(2)-F- 

NAME: Xanthine/uracil permeases family signature- 

CONSENSUS: ELI VM3-P-X-EPASIF 3- V-ELIVM 3- G-G-x ( M ) -ELIVM3-EFY3- 

40 EGSA3-x-ELIVM3-x<3)-G- 

NAME: Anion exchangers family signature 1- 

CONSENSUS: F-G-G-ELIVM3 (2) -EKR3-D-ELIVM3-ERK3-R-R-Y - 

45 NAME: Anion exchangers family signature 2- 

CONSENSUS: EFI3-L-I-S-L-I-F-I-Y-E-T-F-X-K-L - 

NAME: MIP family signature- 

CONSENSUS: EHN12A3-X-N-P-ESTA3-ELIVMF3-EST3-ELIVMF3-EGSTAFY3 - 

50 

NAME: General diffusion Gram-negative porins signature- 

CONSENSUS : ELIVMFY3-x(2)-G-x(2)-Y-x-F-x-K-x(2)-ESN3-ESTAV3- 
ELIVMFYW3-V- 

55 NAME: OmpA-like domain- 

CONSENSUS: ELIVMA3-x-EGT3-x-ETA3-EDA3-x(2)-EI>G3-EGSTP3-x(2)- 
ELFYDE3-EN<2S3-x<2>- 
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CONSENSUS : CLI3-ESG3-E(2E]!-CKR(aE3-R-A-x ( 2 ) -ELVJ-x ( 3 ) -ELIVMFJ- 

x(M-,5)-ELIVM3-x(i|)- 

CONSENSUS: ELIVMl-x (3 ) -ESGD-x-G - 

5 NAME: Eukaryotic mitochondrial porin signature. 

CONSENSUS: EYHID-x(2>-I>-ESPAJ-x-ESTA:1I-x(3>--ETAG:II-EKR:D-ELIVMF:II- 
EDNSTAH-EDNSl-xCM)- 

CONSENSUS: EGSTANl-ELIVMAJ-x-ELIVMYJ - 

10 NAME: Insulin-like growth factor binding proteins signature- 

CONSENSUS: G-C-EGSJ-C-C-x(2>-C-A-x(b)-C- 



15 



45 



NAME: GPRl/FUN3M/yaaH family signature. 
CONSENSUS: N-P-EAV3-P-ELF3-6-L-X-CGSA3-F . 

NAME: SNS1/SURM family signature- 
CONSENSUS: L-x-F-L-H-x-Y-H-H- 



NAME: M3 Kd postsynaptic protein signature- 
20 CONSENSUS: 6-<2-D-<2-T-K-<J-<2-I . 

NAME: Actins signature 1- 

CONSENSUS: EFY3-ELIVl]-G-E])E3-E-A-<2-x-ERK<33l (2) -6 • 

25 NAME: Actins signature 2- 

CONSENSUS: U-EIVl-ESTAJ-ERO-x-EDEIII-Y-EDNEll-EDEIII - 

NAME: Actins and actin-related proteins signature- 
CONSENSUS: ELMJ-ELIVMJ-T-E-EGAPcO-x-ELIVMFYlilHcO-N-EPSTAlO- 
30 x(2)-N-EKR3. 

NAME: Annexins repeated domain signature* 

CONSENSUS: ETGl-ESTVl-x ( fl) -ELIVMFl-x (2) -R-x ( 3) -OEflNHH-x ( 7) - 

EIFYJ-x(7)-ELIVMF]l- 
35 CONSENSUS: x <3)-ELIVMF3-x (11) -ELIVMFAl-x (2) -ELIVMFl - 

NAME: Caveolins signature- 
CONSENSUS: F-E-D-V-I-A-E-P - 

40 NAME: Clathrin light chain signature 1- 
CONSENSUS: F-L-A-(3-<2-E-S • 



NAME: Clathrin light chain signature 2- 

CONSENSUS: EKRI-D-x-S-EKRJ-ELIVMl-EKRl-x-ELIVM]] (3) -x-L-K - 

NAME: Clusterin signature 1- 
CONSENSUS: C-K-P-C-L-K-x-T-C • 



NAME: Clusterin signature 5- 
50 CONSENSUS: C-L-ERO-M-ERO-x-EEO-C-EEDH-K-C - 

NAME: Connexins signature 1- 

CONSENSUS: C-EDN]]-T-x-<3-P-G-C-x(2)-V-C-Y-D. 

55 NAME: Connexins signature 2- 

CONSENSUS: C-x(3^)-P-C-x(3)-ELIVM3-0:i>EN3-C-EFY3-ELIVM]l-ESAl- 
IKK1-P. 
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NAME: Crystallins beta and gamma 'Greek key' motif 
signature. 

CONSENSUS: ELIVMFYUA3-x~[]>EHRKSTP:i-EFY3-D:DE<2HKY3-x (3) HEFY3-X- 

G-xCO-ELIVnTCSTJ. 

5 

NAME: Dynamin family signature- 

CONSENSUS: L-P-CRKl-G-CSTNl-CGNJ-CLIVM-V-T-R. 

NAME: Dynein light chain type 1 signature- 

10 CONSENSUS: H-x-I-x-G-CKRl-x-F-CGAl-S-x-V-ESTl-EHYl-E- 

NAME: FtsZ protein signature 1- 

CONSENSUS: N-EST3-D-x-(2-x-L-x(lb-.ia)-G-x-G-EATV]I-G-EGSAN3-x- 
P-x(S)-G- 

NAME: FtsZ protein signature 2- 

CONSENSUS: ONHKR3-ELIVMF3-x-IELIVMF3(2>-ff:VSTAC3-fl:STAC3-G-x-G- 
EGK3-G-T-G-EST3-G- 

CONSENSUS: E6SAR3-ESTA3-P-ELIVMFT3-ELIVMF3-ESGAV3 - 

NAME: Fungal hydrophobins signature- 

CONSENSUS: EGN3-EDN(3PSA3-x-C-EGSTANK3-EGSTADN(23-ESTN<3I3- 
EPTIVH-x-C-C-EDENflKPSTI- 

25 NAME: Intermediate filaments signature- 

CONSENSUS: EIV3-x-ETACI3-Y-ERKH3-x-ELM3-L-EDE3 - 



15 



20 



30 



NAME-' Involucrin signature- 
CONSENSUS: <M-S-El3H3-(2-x-T-ELV3-P-V-T-ELV3 - 

NAME: Kinesin motor domain signature- 

CONSENSUS: EGSA3-EKRHPSTiaVM3-ELIVMF3-x-ELIVMF3-EIVC3-l>-L- 
EAH3-G-ESAN3-E - 

35 NAME : Kinesin motor domain profile- 

NAME: Kinesin light chain repeat- 

CONSENSUS: OE(2R3-A-L-x <3)-EGE<23-x(3)-G-x-EI>NS3-x-P-x-V-A- 

x(3)-N-x-L-EAS3- 
40 CONSENSUS: x(5)-E<2R3-x-IKR3-EFY3-x(2)-EAV3-x(4)-EHKN(23. 

NAME: Myelin basic protein signature- 
CONSENSUS: V-V-H-F-F-K-N - 

45 NAME: Myelin PD protein signature- 

CONSENSUS: S-EKR3-S-X-K-EAG3-X-ESA3-E-K-K-ESTAJ-K - 



50 



NAME: Myelin proteolipid protein signature 1- 
CONSENSUS: G-EMV3-A-L-F-C-G-C-G-H - 

NAME: Myelin proteolipid protein signature 2- 

CONSENSUS: C-x-EST3-x-EDE3-x(3)-EST3-EFY3-x-L-EFY3-I-x(4)-G- 

A- 

55 NAME: Neuromodulin (GAP-M3) signature 1- 
CONSENSUS: <M-L-C-C-ELIVM3-R-R - 

NAME: Neuromodulin (GAP-M3) signature 2- 
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CONSENSUS: S-F-R-G-H-I-x-R-K-K-CLIVMJ • 

NAME: Osteopontin signature. 

CONSENSUS: EKdO-x-ETAI-x C2)-IEGA:D-S-S-E-E-K. 

5 

NAME: Peripherin / rom-1 signature- 

CONSENSUS: D-EGS3-V-P-F-CST3-C-C-N-P-X-S-P-R-P-C • 

NAME: Profilin signature. 
10 CONSENSUS: <x(D-,l)-ESTA3-x(D-.l)-U-E»EN(2H3-x-EYI]l-x-[[:i>E(J3. 

NAME: Surfactant associated polypeptide SP-C palmitoylation 
sites. 

CONSENSUS: I-P-C-C-P-V- 
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50 



NAME: Synapsins signature 1< 

CONSENSUS: L-R-R-R-L-S-D-S - 



NAME: Synapsins signature 2 • 
20 CONSENSUS: G-H-A-H-S-G-M-G-K-V-K - 

NAME: Synaptobrevin signature. 

CONSENSUS: N-ELIVM3-CDENS1-CKL]1-V-x-IEI>E(J3-R-x(2)-D:KR]1-0:LIVM3- 
ESTDEJ-x-ELIVMJ-x-IEDE])- 
25 CONSENSUS: EKR3-ETA3-EDE3 - 

NAME: Synaptophysin / synaptoporin signature. 
CONSENSUS: L-S-V-OEH-C-x-N-K-T ■ 

30 NAME: Tropomyosins signature- 
CONSENSUS: L-K-E-A-E-x-R-A-E - 

NAME: Tubulin subunits alpha-i beta-, and gamma signature. 
CONSENSUS: ESAGJ-G-G-T-G-ESA3-G • 

NAME: Tubulin-beta mRNA autoregulation signal. 
CONSENSUS: <M-R-EDE3-EILJ- 



NAME: Tau and MAP proteins tubulin-binding domain signature- 
40 CONSENSUS: G-S-x(2>-N-x(2)-H-x-EPA3-EAG3-G(2> - 

NAME: Neuraxin and MAP1B proteins repeated region signature- 
CONSENSUS: ESTAGDN3-Y-X-Y-E-X (2) -IDE3-EKR3-ESTAGCI3 • 

45 NAME: F-actin capping protein alpha subunit signature 1- 
CONSENSUS: V-H-EFY3(2)-E-D-G-N-V- 



NAME: F-actin capping protein alpha subunit signature 2- 
CONSENSUS: F-K-IAE3-L-R-R-X-L-P . 

NAME: F-actin capping protein beta subunit signature. 
CONSENSUS : C-D-Y-N-R-D- 



NAME: Vinculin family talin-binding region signature- 

55 CONSENSUS: EKR3-x-ELIVMF3-x(3)-CLIVMA3-x<2)-ELIVM3-xCb>-R-<2- 
ta-E-L- 

NAME: Vinculin repeated domain signature. 
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CONSENSUS: ELIVn3-x-E<2A3-A-x(2>-U-EIL3-x-El>N3-P. 

NAME: Amyloidogenic glycoprotein extracellular domain 
signature. 

CONSENSUS: G-EVT3-E-EFY3-V-C-C-P ■ 

NAME: Amyloidogenic glycoprotein intracellular domain 
signature- 

CONSENSUS: G-Y-E-N-P-T-Y-EKR3 • 

NAME: Cadherins extracellular repeated domain signature- 
CONSENSUS: ELIV3-x-ELIV3-x-D-x-N-D-ENH3-x-P . 



NAME: Insect cuticle proteins signature- 

15 CONSENSUS: G-x(7)-EI>EN3-G-x<b)-Y-x-A-E]>NG3-x(2-,3)-G-IFY3-x- 
EAP3. 

NAME: Gas vesicles protein GVPa signature 1- 

CONSENSUS: ELIVM3-x-OE3-ILIVMFYT3-ELIVM3-EDE3-x-ELIVM3 (2 ) - 

20 EDKR3(2)-G-x-ELIVI13<2) . 

NAME : Gas vesicles protein GVPa signature 2- 

CONSENSUS: R-ELIVA3C3)-A-EGS3-ILIVnFY3-x-T-x(3>-Y-EAG3. 

25 NAME: Gas vesicles protein GVPc repeated domain signature- 

CONSENSUS: F-L-x (2) -T-x (3) -R-x (3) -A-x (2) -<2-x (3) -L-x (2)-F . 

NAilE: Bacterial microcompartiments proteins signature- 

CONSENSUS: D-x(0 -.1 ) -M-X-K-ESAG3 (2) -X-EIV3-X-ELIVM3-ELIVMA 3- 

30 EGCS3-x(M)-EGD3-H:SGPD3- 
CONSENSUS: EGA3- 

NAME: Flagella basal body rod proteins signature- 

CONSENSUS : EGTARY<23-x ( 1 ) -ELIVMYSTA3 (2 ) -EGSTA3-ESTADEN3-N- 

35 ELIVM3-ESAN3-N-X-ESADNFR3- 
CONSENSUS: ESTV3- 

NAME: Flagella transport protein fliP family signature 1- 

CONSENSUS: EPA3-A-EFYJ-x-ELIVT3-l[STH3-CE(33-CLI3-x(2)-CGA3-F- 
40 EKRE<23-EIM3-G-ELIF3- 

NAME: Flagella transport protein fliP family signature 2- 

CONSENSUS: P-ELIVMF3-K-li:LIVMF3<5)-x-CLIVMA3-ONGS3-G-lil. 

45 NAME: Plant viruses icosahedral capsid proteins 'S' region 
signature. 

CONSENSUS: EFYhJ3-x-EPSTA3-x(7)-G-x-ELIVM3-x-ELIVM3-x-l[FYIi)I3- 
x(2)-D-x(S)-P. 

50 NAME: Potexviruses and carlaviruses coat protein signature. 
CONSENSUS: ERK3-EFYU3-A-EGAP3-F-D-x-F-x(2)-ELV3-x(3)- 
EGAST3 (2) - 

NAME: Neurotransmitter-gated ion-channels signature- 
55 CONSENSUS: C-x-ELIVMF(23-x-ELIVMF3-x(2)-EFY3-P-x-D-x(3)-C. 

NAME: ATP P2X receptors signature- 
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CONSENSUS: G-G-x-CLIVMJ-G-CLIVMJ-x-OV J-x-ld-x-C-ONJ-L-D- 

x(5)-C-x-P-x-Y-x-F- 

NAME: G-protein coupled receptors signature. 

5 CONSENSUS: EGSTALIVMFYUC3-EGSTANCPDE3-{EDPKRH3--x (E) - 

ELIVMN<3GA3-x(E)-ELIVMFT3- 
- CONSENSUS : EGSTANC3-ELIVMFYWSTAC3-fl:i>ENH3-R-EFYWCSH3-x(2>- 

ELIVI13 • 

10 NAME: G-protein coupled receptors family 2 signature !• 

CONSENSUS : C-x C3)-0:FYblLIV3-I>-x C3-. iO -C-EFU3-X (2) -ESTAGV3- 

x(fl-. t l)-C-EPF3. 

NAME: G-protein coupled receptors family E signature 2. 

15 CONSENSUS: d-G-tLHFCA3-ELIVMFT J-ELIV3-X-ELIVFST1-CLIF3- 

EVFYH3-C-ELFY3-x-N-x(2)-V. 

NAME: G-protein coupled receptors family 3 signature 1- 

CONSENSUS: ELV3-X-N-ELIVM3 <2)-x-L-F-x-I-EPA3-<2-ELIVM3-ICSTA3- 

20 x-ESTA3(3)-ESTAN3. 

NAME: G-protein coupled receptors family 3 signature 2. 

CONSENSUS : C-C-EFYU3-x-C-x(2)-C-x(H)-EFYUl-x(2-.M)-El>N3-x(E)- 
ESTAH3-C-x(2)-C 
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NAME: G-protein coupled receptors family 3 signature 3. 

CONSENSUS: F-N-E-ESTAH-K-x-I-ISTAGJ-F-ESTH-M^ 



NAME: Visual pigments (opsins) retinal binding site- 

30 CONSENSUS: ILIVmilAC3-0:PGAC3-x<3)-ESAC3-K-0:STALIMR3-EGSACPNV3- 
ESTACP3-x(E)-EDENF]l- 
CONSENSUS: EAP3-x(2)-EIY3. 

NAME: Bacterial rhodopsins signature 1- 

35 CONSENSUS: R-Y-x-EDT3-U-x-ELIVMF3-EST3-T-P-ELIVM3(3) . 

NAME: Bacterial rhodopsins retinal binding site- 

CONSENSUS: EFYIV3-x-EFYVG3-ELIVM3-D-ELIVMF3-x-CSTA3-K-x(E)- 



IFY3. 

NAME: Receptor tyrosine kinase class II signature. 

CONSENSUS: EDN3-ELIV3-Y-x(3)-Y-Y-R. 



NAME: Receptor tyrosine kinase class III signature. 

45 CONSENSUS : G-X-H-X-N-ELIVM3-V-N-L-L-G-A-C-T • 

NAME: Receptor tyrosine kinase class V signature 1. 

CONSENSUS: F-x-EDN3-x-EGAliJ3-EGA3-C-ELIVM3-ESA3-ELIVM3 (2) - 

1SA3-ELV3-EKRH123-ELIVA3- 
50 CONSENSUS: x(3)-EKR3-C-EPSAU3. 

NAME: Receptor tyrosine kinase class V signature 2- 

CONSENSUS: C-x(E)-I»E3-G-EDE(23-liJ-x(£-,3)-EPAfl3-ELIVMT3-EGT3-x- 
C-x-C-x(2)-G-EHFY3- 
55 CONSENSUS: EEdD- 

NAME: Growth factor and cytokines receptors family signature 

1. 
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CONSENSUS: C-ELVFYRJ-x ( 7 -, a ) -ESTIVDN J-C-x-li) - 

NAME: Growth factor and cytokines receptors family signature 
2- 

5 CONSENSUS: ESTGLH-x-lil-ESGl-x-U-S . 

NAI1E: TNFR/NGFR family cysteine-rich region signature- 

CONSENSUS: C-x<4-,b)-EFYH]]-x(5-.10>-C-x(D-,2>-C-x(2-i3)-C- 
x ( 7 , 11 ) -C-x ( M , b ) -EDNEflSKPH- 

10 CONSENSUS: x(2)-C 

NAME: TNFR/NGFR family cysteine-rich region domain- 

NAME: Integrins alpha chain signature. 

15 CONSENSUS: EFYUS3-ERO-X-G-F-F-X-R. 

NAME: Integrins beta chain cysteine-rich domain signature- 

CONSENSUS: C-x-CGNtJ3-x(l-.3)-G-x-C-x-C-x(5)-C-x-C. 

20 NAME: Natriuretic peptides receptors signature- 

CONSENSUS: G-P-x-C-x-Y-x-A-A-x-V-x-R-x(3)-H-U- 

NAME: Photosynthetic reaction center proteins signature- 

CONSENSUS : ENH3-x(M)-P-x-H-x(5)-CSAG3-x(ll)-CSAGC3-x-H- 
25 ESAG3 (2) • 

NAME: Antenna complexes alpha subunits signature- 

CONSENSUS : ELIVFAGI-x-EGASVU-ELIVFAl-x-EIVll-H-x (3) -ELIVnU- 
EGSTAE3-ESTANH3-x(l-,3)- 

30 CONSENSUS: ESTNl-lil-ELIVMFYliU] - 

NAME: Antenna complexes beta subunits signature. 

CONSENSUS: EEflH-x (H )-H-x ( S)-CGSTA3-x (3) -EFYl-x (3>-EAGJ-x (2)- 
EAV3-H-x(7)-P. 
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NAME: Photosystem I psaA and psaB proteins signature- 

CONSENSUS: C-D-G-P-G-R-G-G-T-C ■ 



NAME: Photosystem I psaG and psaK proteins signature- 

40 CONSENSUS: G-F-x-ELIVMl-x-EDEA3-x(2)-EGA3-x-EGTAl-ESA3-x-G-H- 
X-ELIVM3-EGA3- 



NAME: Phytochrome chromophore attachment site signature. 
CONSENSUS: ERGS3-EGSA3-EPV3-H-x-C-H-x(2)-Y- 

NAME: Phytochrome chromophore attachment site domain 
profile. 



NAME: Speract receptor repeated domain signature- 
50 CONSENSUS: G-x (S)-G-x (2) -E-x(b)-ld-G-x (2) -C-x (3) -EFYWJ-x (fl ) -C- 

x(3)-G- 

NAME: TonB-dependent receptor proteins signature 1- 
CONSENSUS: <x(lD-,115)-El>ENF3-EST3-ELIVMF3-ELIVSTEfl3-V-x- 
55 EAGP3-ESTANE<2PO- 

NAME: TonB-dependent receptor proteins signature 2- 
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CONSENSUS: ELYGSTANE3-X (3) -IEGSTAENl23-x-n:PGE3-R-x-ELIvTYUA3-x- 

ELIVMFTAS-IESTAGNtO- 

CONSENSUS: ELIVMFYGTAS-x-ELIVMFYUGTADtO-x-F^ 

5 NAME: Transmembrane 4 family signature- 

CONSENSUS : G-x(3)-ICLIVMF3-x(2)-EGSA3-li:LIVMF3(2)-G-C-x-EGA3- 

ESTA3-x(2)-EEG3-x(2)- 

CONSENSUS: ECUN3-ILLIVM3<2) . 

10 NAME: Bacterial chemotaxis sensory transducers signature- 

CONSENSUS : R-T-E-EE(23-(2-x (2) -ESAI-ELIVf-O-x-IHEtO-T-A-A-S-M-E- 

(2-L-T-A-T-V. 

NAME: ER lumen protein retaining receptor signature 1- 

15 CONSENSUS: G-I-S-x-EKR3-x-<2-x-L-EFY3-x-0:LIV3<2)-F-x<2)-R-Y. 

NAME: ER lumen protein retaining receptor signature B- 

CONSENSUS: L-E-ESAJ- V-A-I-ELM J-P-<2-L - 

20 NAME: Ephrins signature - 

CONSENSUS: EKRO-ELF3-ECSTJ-x-K-OF3-fl-x-EFY3-ESTJ-ICPA:D-x<3>- 

G-x-E-F-x(S)-EFY3(2)- 

CONSENSUS: x(2)-ESAH- 

25 NAME: Granulins signature- 

CONSENSUS : C-x-D-x (2 ) -H-C-C-P-x ( 4) -C . 
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NAME: HBGF/FGF family signature. 

CONSENSUS: G-x-L-x-ESTAGP3-x (tn7)-EDE3-C-x-IEFM3-x-E-x (b ) - Y- 

NAME: PTN/MK heparin-binding protein family signature 1- 

CONSENSUS: S-EDE3-C-x-EDE3-bl-x-U-x(2)-C-x-P-x-ESN3-x-D-C-G- 
ILIVMA3-G-X-R-E-G- 

35 NAME: PTN/MK heparin-binding protein family signature 2- 

CONSENSUS : C-EKR3-ELIVM3-P-C-N-U-K-K-x-F-G-A-EDE3-C-K-Y-x-F- 
EEO-x-bJ-G-x-O 

NAME: Nerve growth factor family signature- 

40 CONSENSUS: G-C-EKR3-G-ELIV3-EDE3-X (3)-EYU3-x-S-x-C- 

NAME: Platelet-derived growth factor (PDGF) family 
signature- 

CONSENSUS: P-EPS3-C-V-x(3)-R-C-E6STA3-G-C-C 

45 

NAME: Small cytokines ( intercrine/chemokine) C-x-C subfamily 
signature • 

CONSENSUS: C-x-C-ELIVM3-x(S-,t.)-ELIVMFY3-x(2)-ERKSE(23-x- 
CLIVM3-x(2)-ELIVM3-x<5)- 
50 CONSENSUS: ESAG3-x(2)-C-x(3)-EE(33-ELIVM3(2)-x( c l-.10)-C-L-EDN3. 

NAME: Small cytokines (intercrine/chemokine) C-C subfamily 

signature. 

CONSENSUS: C-C-ELIFYT3-x(S-.b)-ELI3-x(M)-ELIVMF3-x(2)-EFYIil3- 
55 x<bifl>-C-x(3i4)-ESAG3- 

C0NSENSUS: ELIVM3 C 2)-EFL3-x Cfl) -C-ESTA3 - 

NAME : TGF-beta family signature- 
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CONSENSUS: ILIVM3-X ( 2 ) -P-x ( 2) -EFY3-X ( 4 ) -C-x-G-x-C - 

NAME: TNF family signature- 

CONSENSUS: li:LV3-x-li:LIVM3-xC3>-G-[i:LIVMF3-Y-ELIVMFY3C2)-x(2)- 
5 E<2EKHL3-Q:LIVMGT3-x- 

CONSENSUS: ELIVMFY3. 

NAME: TNF family profile. 

10 NAME: Unt-1 family signature- 

CONSENSUS: C-K-C-H-G-ELIVMTl-S-G-x-C - 

NAME: Interferon alpha-i beta and delta family signature. 
CONSENSUS: EFYH3-EFY3-x-EGNRC3-ELIVM3-x(2)-IFY:D-L-x(7)-ECY3- 
15 A-ld- 

NAME: Granulocyte-macrophage colony-stimulating factor 
signature • 

CONSENSUS: C-P-ELP3-T-X-E-EST3-X-C - 
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NAME: Interleukin-1 signature- 

CONSENSUS: EFC3-x-S-EASLV3-x (2) -P-x (2) -EFYLIV3-ELI3-ESCA3-T- 

x(7)-ELIVM3- 



25 NAME: Interleukin-2 signature. 

CONSENSUS: T-E-IELF3-x(2) -L-x-C-L-x (E)-E-L- 

NAME: Interleukins -4 and -13 signature- 

CONSENSUS: L-x-E-ELIVM3(2)-x(4-.S)-ELIVM3-ETL3-x<5-i7)-C-x(M>- 

30 eiva3-x-edns3-elivma3. 

NAME: Interleukin-b / G-CSF / MGF signature. 

CONSENSUS: C-x < T ) -C-x (b > -G-L-x <2> -EFY3-X (3 ) -L • 

35 NAME: Interleukin-7 and -1 signature. 

CONSENSUS: N-X-ELAP3-ESCT3-F-L-K-X-L-L ■ 



NAME: Interleukin-1D family signature. 

CONSENSUS: EGS3-C-x(2)-ELV3-x<2)-ELIVM3(2)-x-F-Y-L-x(2)-V. 

NAME: LIF / OSM family signature. 

CONSENSUS: EPST3-X (M ) -F-ENfl3-x-K-x (3>-C-x-ELF3-L-x ( 2) -Y-EHK3 • 



NAME: Macrophage migration inhibitory factor family 

45 signature. 

CONSENSUS: EDE3-P-C-A-x(3)-ELIVM3-x-S-I-G-x-ELIVM3-G. 



NAME: Adipokinetic hormone family signature- 
CONSENSUS : <2-l[LV3-ENT3-EFY3-IST3-x(2)-li|. 

NAME: Bombesin-like peptides family signature- 
CONSENSUS: U-A-X-G-ESH3-ELF3-M - 



NAME: Calcitonin / CGRP / IAPP family signature. 

55 CONSENSUS: C-ESAGDN3-ESTN3-x<0-.l>-ESA3-T-C-EVMA3-x(3)-ELYF3- 
x(3)-ELYF3- 

NAME: Corticotropin-releasing factor family signature. 

-463- 



WO 01/98454 PCT/IBO 1/02050 

CONSENSUS: CPfiO-x-CLIVMll-S-IELIVni-x (2>-IEPST:D-CLIV[1F3-x- 

CLIVMJ-L-R-x ( E) -ILIVIIJ • 

NAME: Crustacean CHH/MIH/GIH neurohormones family signature- 

5 CONSENSUS: C-OENO-D-C-x-N-ELIVI-EFYD-R-x (7 ) -C-EKR3-X (2 ) -C - 

NAME: Erythropoietin / thrombopoeitin signature- 

CONSENSUS: P-x ) -C-D-x-R-ILIVMH (2) -x-EKRJ-x (1M )-C 

10 NAME: Granins signature 1- 

C0NSENSUS: EDE3-ESN3-L-ESAN3-x(2)-EDE3-x-E-L. 

NAME: Granins signature 2- 

CONSENSUS: C-ELIVI13(2)-E-ELIVI1]l(H)-S-C]>N3-CSTA3-L-x-K-x-S- 
15 x(3)-ELIVM3-ESTA3-x-E-C 

NAtlE: Galanin signature- 

CONSENSUS: G-U-T-L-N-S-A-G-Y-L-L-G-P-H • 

20 NAME: Gastrin / cholecystokinin family signature- 

CONSENSUS: Y-x CtKD-EGDJ-EUI-ni-M-CDRl-F. 

NAME: Glucagon / GIP / secretin / VIP family signature- 

CONSENSUS: EYH3-CSTAIVGDJ-OEfl3-CAGF3-ELIVnSTE3-|[FYLR]]-x- 
25 OENSTAKJ-OENSTAJ- 

C0NSENSUS: ELIVMFYGJ-x ( D -CKREflLl-EKRDEN(2L3-ELVFYUG3-ELIViJ3 - 
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NAME: Glycoprotein hormones alpha chain signature 1. 
CONSENSUS: C-x-G-C-C-EFYH-S-R-A-EFYJ-P-T-P- 

NAME: Glycoprotein hormones alpha chain signature 2. 
CONSENSUS: N-H-T-x-C-x-C-x-T-C-x(2)-H-K- 



NAME: Glycoprotein hormones beta chain signature 1- 

35 CONSENSUS: C-ESTAGM3-6-EHFYL3-C-X-ESTJ. 



NAME: Glycoprotein hormones beta chain signature 2- 

CONSENSUS: IPA:D-V-A-x<2)-C-x-C-x(2)-C-x(iO-ESTDJ-EDEYJ-C-. 
x(b-.fi)-EPGSTAVM]l-x(2)-C. 

NAME: Gonadotropin-releasing hormones signature- 

CONSENSUS: fl-H-EFYUl-S-xCiD-P-G. 



NAME: Insulin family signature. 

45 CONSENSUS: C-C-{P>-x(2)-C-ESTl>NEKPIl-x(3)-ELIVMFS3-x(3)-C. 

NAME: Natriuretic peptides signature- 

CONSENSUS: C-F-G-x (3)-»-R-I-x(3)-S-x(2)-G-C 

50 NAME: Neurohypophysial hormones signature- 

CONSENSUS: C-ELIFY3C2)-x-N-ECSJ-P-x-G- 



NAME: Neuromedin U signature- 

CONSENSUS: F-ELIVMFJ-F-R-P-R-N - 

NAME: Endogenous opioids neuropeptides precursors signature. 

CONSENSUS: C-x(3)-C-x(5)-C-x(2)-EKRH3-x(b-.7)-ELIF3-EDN3-x(3>- 
C-x-ELIVMJ-ICECO-C- 
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CONSENSUS: EEfiH-x ( A) -U-x ( 2) -C ■ 

NAME: Pancreatic hormone family signature. 

CONSENSUS: EFY])-x(3)-CLIVMJ-x(E)-Y-x<3)-ELIVriFY]]-x-R-x-R- 
5 lYF]]- 

NAME: Parathyroid hormone family signature. 

CONSENSUS : V-S-E-x-<3-x(2)-H-x(2)-G. 

10 NAME: Pyrokinins signature- 

CONSENSUS : F-EGSTVJ-P-R-L-EOH - 

NAME: Somatotropin! prolactin and related hormones signature 
1. 

15 CONSENSUS: C-x-CSO-x<2>-ELIVnFYJ-x-ELIVHSTAJ-P-x(S)-ETALIVJ- 
x(7)-CLIVMFY]l-x(b)- 

CONSENSUS: CLIVnFY3-x(2)-CSTA3-U. 

NAME: Somatotropin-, prolactin and related hormones signature 
20 2. 

CONSENSUS: C-CLIVf1FY3-x(2)-D-CLIVt1FYSTAlI-x(S)-li:LIVMFY3-x (2)- 
CLIVf1FYT]l-x(2)-C. 

NAME: Tachykinin family signature. 

25 CONSENSUS: F-CIVFYJ-G-O-MJ-ll-ltOJ - 

NAME: Thymosin beta-H family signature. 

CONSENSUS: K-L-K-K-T-E-T-C2-E-K-N ■ 

30 NAME : Urotensin II signature- 

CONSENSUS: C-F-U-K-Y-O 
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NAME: Cecropin family signature- 

CONSENSUS: U-x(D-,2)-EKDN3-x(E)-K-EKRE3-ELini-E-CRKNJ. 

NAME: Mammalian defensins signature. 
CONSENSUS : C-x-C-x(3iS)-C-x(7)-G-x-C-xC e i)-C-C. 



NAME: Arthropod defensins signature- 
40 CONSENSUS: C-x(2-.3)-IHNJ-C-xC3-.i4)-li:GR]]-x<2)-G-6-x-C-xCi»-.7)-C- 
x-C 

NAME: Cathelicidins signature 1- 

CONSENSUS : Y-x-CEDiB-x-V-x-ICRtn-A-ILIVMAJ-OlJGJ-.xHCLIVIlFYID-N- 
45 EEC J- 

NAME: Cathelicidins signature 2* 

CONSENSUS: F-x-ELIVH3-K-E-T-x-C-x (10)-C-x-F-EKR3-EKE3 . 

50 NAME: Endothelin family signature. 

CONSENSUS: C-x-C-x(4 )-I>-x (2) -C-x (2)~CFY3-C 



NAME: Plant thionins signature. 

CONSENSUS : C-C-x ( S ) -R-x (2 ) -EFYJ-x (2 ) -C • 

NAME: Gamma-thionins family signature- 

CONSENSUS : CKRJ-x-C-x(3>-ESV3-x<2>-EFYUH3-x-EGF3-x-C-x(5>-C- 
x(3)-C- 
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NAME: Snake toxins signature. 

CONSENSUS: G-C-x(l-,3)-C-P-x(fl-,lD)-C-C-x(E)-CP]>EN3. 

5 NAME: flyotoxins signature- 

CONSENSUS: K-x-C-H-x-K-x (2) -H-C-x (2) -K-x ( 3) -C-x (fl)-K-x ( 2 ) -C- 
x(2)-CRK3-x-K-C-C-K-K. 

NAME: Scorpion short toxins signature. 

10 CONSENSUS: C-x(3)-C-x(b-, t 1)-CGASl-K-C-EII1i3T]|-x(3)-C-x-C. 

NAME: Heat-stable enterotoxins signature- 

CONSENSUS: C-C-x(2)-C-C-x-P-A-C-x-G-C 

15 NAME: Aerolysin type toxins signature- 

CONSENSUS: EKT3-X C2)-N-bl-x (2) -T-ONl-T . 

NAME: Shiga/ricin ribosomal inactivating toxins active site 
signature. 

20 CONSENSUS: CLIVriA3-x-ELIVIlSTA3(2)-x-E-ESAGV3-ESTAL31-R-EFY]l- 
ERKNflSJ-x-IELIVrO-IEEaSIII- 

CONSENSUS: x (2) -ILIVMF]! - 

NAME: Channel forming colicins signature- 

25 CONSENSUS: T-x(2)-U-x-P-ELIVnFY3(3)-x(2)-E- 

NAME: Hok/gef family cell toxic proteins signature- 

CONSENSUS: ELIVMA3 (M ) -C-ILIVMFAJ-T-ILIVMAJ (2) -x (H) -CLIVM3-X- 
ERG]l-x(2)-L-CCY]|. 

30 

NAME: Staphylococcal enterotoxin/Streptococcal pyrogenic 

exotoxin signature 1- 

CONSENSUS: Y-G-G-ELIVl-T-xm-N- 

35 NAME: Staphyloccocal enterotoxin/Streptococcal pyrogenic 
exotoxin signature 2- 

CONSENSUS: K-x ( 2) -CLIVl-x ( M )-ELIVJ-D-x (3> -R-x ( 2) -L-x ( 5) - 
CLIV3-Y. 

40 NAME: Thiol-acti vated cytolysins signature- 

CONSENSUS: CRKJ-E-C-T-G-L-x-bl-E-U-U-CRKI ■ 

NAME: Membrane attack complex components / perforin 
signature. 

45 CONSENSUS: Y-x (b) -CFYJ-G-T-H-CFY3 . 

NAME: Pancreatic trypsin inhibitor (Kunitz) family 
signature- 

CONSENSUS: F-x(3)-G-C-x(b)-IEFY3-x(S)-C. 
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NAME: Bowman-Birk serine protease inhibitors family 

signature- 

CONSENSUS: C-x(5-.b)-CDEN(2KRHSTA3-C-EPAST]>H3-CPASTI>K3-CASTDV3- 
C-ENDKS3-EDEKRHSTA3-C. 

NAME: Kazal serine protease inhibitors family signature- 

C0NSENSUS: C-x(?)-C-x(b)-Y-x(3)-C-x(2-,3)-C 
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NAME: Soybean trypsin inhibitor (Kunitz) protease inhibitors 

family signature. 

CONSENSUS: ELIVM3-x-I>-x-EEDNTY3-El>G3-ERKHDENl23-x-tLIVM3-x(5)- 
Y-X-ELIVM3- 

NAME: Serpins signature. 

CONSENSUS: ELIVMFY3-x-ELIVMFYAC3-E]>N(23-ERKH(3S3-IPST3-F- 

ELIVMFY3-ELIVMFYC3-X- 

CONSENSUS: ELIVMFAH3- 

NAME: Potato inhibitor I family signature- 

CONSENSUS : EFYU3-P-EEAH3-ELIV3 (2) -G-x (5) -ESTAGV3-X (2)-A - 



NAME: Squash family of serine protease inhibitors signature- 
15 CONSENSUS: C-P-x(S)-C-x<2)-D-x-D-C-x(3)-C-x-C- 

NAME: Streptomyces subtilisin-type inhibitors signature- 
CONSENSUS: C-x-P-x(5,3)-G-x-H-P-x(4)-A-C-EATD3-x-L. 

20 NAME: Cysteine proteases inhibitors signature- 

CONSENSUS: EGSTE<aKRV3-t2-ELIVT3-EVAF3-li:SAG<aJ-G-x-ELIVMNK3- 

x(2)-ELIVMFY3-x-ELIVMFYA3- 

CONSENSUS: EDEN(3KRHSIV3 - 

25 NAME: Tissue inhibitors of metal loproteinases signature- 
CONSENSUS: C-x-C-x-P-x-H-P-fi-x-A-F-C • 

NAME: Cereal trypsin/alpha-amylase inhibitors family 
signature • 

30 CONSENSUS: C-x ( 1 ) -ESAGD3-X (M ) -CSPAL3-ELF3-X (2) -C-ERH3-X- 

ELIVMFY3(2)-x(3-.H)-C- 

NAME: Alpha-2-macroglobulin family thiolester region 
signature. 

35 CONSENSUS: EPG3-x-EGS3-C-EGA3-E-EE<23-x-ELIVM3 - 

NAME: Disintegrins signature- 

CONSENSUS: C-x<2)-G-x-C-C-x-ENf2RS3-C-x-EFM3-x(b)-C-ERK3- 

40 NAME: Lambdoid phages regulatory protein CHI signature. 

CONSENSUS: E-S-x-L-x-R-x(2)-CKR3-x-L-x(4)-EKR3(2)-x(2>-E]>E3- 
x-L- 

NAME: Chaperonins cpnbO signature- 
45 CONSENSUS: A-EAS3-x-EDE(33-E-x (M ) -G-G-EG A3 ■ 

NAME: Chaperonins cpnlD signature- 

CONSENSUS: ELIVMFY3-X-P-EILT3-X-EDEN3-EKR3-ELIVMFA3 (3) - 

EKRE<23-x(fl-i1)-ltSG3-x- 
50 CONSENSUS: ELIVMFY3 ( 3) - 

NAME: Chaperonins TCP-1 signature 1- 

CONSENSUS: CRKEL3-EST3-x-ELMFY3-G-P-x-EGSA3-x-x-K-ELIVMF3C2) . 

55 NAME: Chaperonins TCP-1 signature 2- 

CONSENSUS: ELIVM3-ETS3-ENK3-D-EGA3-EAVNHK3-ETAV3-ELIVM3 (2) - 

x(2)-ELIVM3-x-ELIVM3-x- 

CONSENSUS: ESNH3-EPAH3 - 
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NAME: Chaperonins TCP-1 signature 3- 

CONSENSUS: (3-EDEO-x-x-ELIVMGTAl-EGAJ-D-G-T . 

5 NAME: Heat shock hspHQ proteins family profile- 

NAME: Heat shock hsp7D proteins family signature 1- 

CONSENSUS: EIV3-D-L-G-T-EST3-X-ESO . 

10 NAME: Heat shock hsp7D proteins family signature 2- 

CONSENSUS: ELIVMFJ-ELIVMFYl-EDNJ-ELIVMFSJ-G-EGSHl-EGSl-EASTJ- 

x(3)-EST3-ELIVM]]- 

CONSENSUS: ELIVMFC3. 

15 NAME: Heat shock hsp?0 proteins family signature 3- 

CONSENSUS : ELIVMYl-x-ELIVMFJ-x-G-G-x-ESTl-x-ELIVMID-P-x- 
ELIVMI-x-EDEfiKRSTA]). 

NAME: Heat shock hspIO proteins family signature. 

20 CONSENSUS: Y-x-ENflHJ-K-EDEDI-EIVAl-F-L-R-EElO. 

NAME: Chaperonins clpA/B signature 1* 

CONSENSUS: D-EAI3-ES6A:D-N-ELIVMF3I<2)-K-EPT:D-x-L-xC2)-G. 

25 NAME: Chaperonins clpA/B signature 2- 

CONSENSUS: R-ELIVMFYH-D-x-S-E-ELIVMFYH-x-E-EKRcO-x-ESTAD-x- 

ESTA3-EKR3-ELIVM3-X-G- 

CONSENSUS: ESTA3 • 

30 NAME: Nt-dnaJ domain signature. 

CONSENSUS: EFYI-x ( 2) -ELIVMAH-x (3 ) -EFYUHNTH-EDENlJSAI-x-L-x- 

EDN3-x(3)-EKR3-x(2)-EFYIl- 



35 



NAME: dnaJ domain profile- 

NAME: CXXCXGXG dnaJ domain signature- 

CONSENSUS: C-EDEGSTHKR3-x-C-x-G-x-EGO-EAGSDMJ-x(2>-EGSNKR:D- 
x(M-.b)-C-x(B-.3)~C-x-G-x-G. 

40 NAME: grpE protein signature- 

CONSENSUS: EFL3-EDN3-EPHEAl-x(2) -EHMJ-x-A-ELIVMTNJ-x (IbrSD) - 

G-EFY3-x(3)-E»EG3-x(2)- 

CONSENSUS: ELIVH3-ERI3-X-ESA3-X-V-X-EIVJ • 

45 NAME: Bacterial type II secretion system protein C 

signature- 

CONSENSUS: P-x(b)-F-x(M)-L-x(3)-D-ELIVM3-A-ELIVMJ-x-ELIVH:D--N-- 
x-ELIVMl-x-L • 

50 NAME: Bacterial type II secretion system protein D 

signature. 

CONSENSUS: EGRJ-EDEflKGJ-ESTVMI-ELIVMAH <3>-EGA]l-G-ELIVHFY3- 

xCLD-ELIVMl-P- 

CONSENSUS: ELIVMFYblGSJ-ELIVMFJ-EGSAEni-x-ELIVMID-P- 
55 ELIVMFYU3(2)-x(2)-ELV]I-F. 

NAME: Bacterial type II secretion system protein E 

signature. 
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CONSENSUS: ELIVMJ-R-x ( 2 ) -P-D-x-ELIVM]) ( 3) -G-E-ELIVMl-R-D . 

NAME: Bacterial type II secretion system protein F 

signature • 

5 CONSENSUS: CKRfl3-CLIVI1A]|-x(E)-IESAIV3-ELIVI13-x-0:TY3-P-x(E)- 
ELIVMD-x ( 3 > -CSTAGVl-x ( b ) - 
CONSENSUS: ELMY3-x(3)-ELIVMF:D(2)-P. 

NAME : Bacterial type II secretion system protein N 

10 signature. 

CONSENSUS: G-T-L-b)-x-G-x(ll)-L-x(4)-li|. 

NAME: Bacterial export FHIPEP family signature. 

CONSENSUS : R-ELIVMH-CGSAH-E-V-EGSAll-A-R-F-ESTVll-L-D-EGSAll-M- 
15 P-G-K-<2-f1-CGSA3-I-D- 

CONSENSUS: EGSA3-D.. 



NAME: Protein secA signatures- 

CONSENSUS : EIVH-x-EIVl-ESAl-T-ENfll-M-A-G-R-G-x-D-I-x-L- 
NAME: Protein secY signature !• 

CONSENSUS: EGSTJ-ELIVMFJ^-x-ELIVMJ-G-ELIVMI-x-P- 
ELIVMFY3 (3 ) -x-CAS3-ICGST(J]l- 
CONSENSUSi ELIVMFAT3(3)-fl-Q:LIVMFA]I(2) . 

NAME: Protein secY signature 2- 

CONSENSUS: CLIVMFYUJ(2)-x-n:]>E3-x-IELIVMF3-ESTNJ-x(2)-G- 
ELIVMF3-EGST3-CNST31-G-X-EGST3- 
CONSENSUS: ELIVMF3 (3) . 

NAME: Protein secE/secbl-gamma signature- 

CONSENSUS: ILIVMFY3-x(2)-IDENflGA3-x(H)-ELIVMTA]I-x-ICKRV]|-x(2)- 
EKU3-P-x(3)-CSE(2]|-x(7)- 

CONSENSUS: ELIVT3-ELIVGA3-CLIVFGAST3I - 

NAME: Gram-negative pili assembly chaperone signature- 

CONSENSUS: ELIVMFYJ-EAPNJ-x-EDNSI-EKREflH-E-CSTRa-ELIVMARl-x- 
IFYUT3-X-ENC3-ELIVM3- 
CONSENSUS: x(2)-ELIVM3-P-EPAS3 - 

NAME: Fimbrial biogenesis outer membrane usher protein 

signature • 

CONSENSUS: EVLl-EPAS(33-IPAS3-G-EPAl>3-EFY]-x-ELI3-EI>N(3STAP3- 
EDNH3-ELIVMFYJ- 

NAME: SRPSI-type proteins GTP-binding domain signature- 

CONSENSUS: P-ELIVMI-x-EFYLH-ELIVMATJ-EGSH-x-EGSl-EECO-x ( 4 ) - 

ELIVMFJ- 

50 NAME: Cytochrome c oxidase assembly factor COXID/ctaB/cyoE 

signature- 

CONSENSUS: IED3-x-»-x(2)-M-x-R-T-x(2)-R-x(H)-G- 

NAME: Cyclin-dependent kinases regulatory subunits signature 

55 1 . 

CONSENSUS: Y-S-x-EKR3-Y-x-El)E3(2)-x-EFY3-E-Y-R-H-V-x-ELV3- 
CPT3-EKRP3- 
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NAME: Cyclin-dependent kinases regulatory subunits signature 

a- 

CONSENSUS: H-x-P-E-x-H-EIVZD-L-L-F-EKRJ. 

5 NAME: Pentaxin family signature. 

CONSENSUS: H-x-C-x-ESTJ-bl-x-ESTH - 

NAME: Immunoglobulins and major histocompatibility complex 

proteins signature- 
10 CONSENSUS: * EFYJ-X-C-X-EVA3-X-H . 

NAME: Prion protein signature 1- 

CONSENSUS: A-G-A-A-A-A-G-A-V-V-G-G-L-G-G-Y . 

15 NAME: Prion protein signature 2- 

CONSENSUS: E-x-EEDJ-x-K-ELIVMU (2) -x-EKRH-ELIVMJ (2) -x-E(2E3-M- 

C-x(E)-C3-Y. 

NAME: Cyclins signature- 

20 CONSENSUS: R-x<2)-ELIVMSA]l-x(2>-EFYIiJS3-ELIVM]l-x(fl)-ELIVMFC:D- 
x(M)-ELIVMFYA]l-x<2)- 

CONSENSUS: ESTAGO-ELIVMFYlO-x-ELIVMFYO-ELIVMFYDI-D-ERKHll- 
ELIVMFYUH. 

25 NAME: Proliferating cell nuclear antigen signature 1. 

CONSENSUS: EGAH-ELIVMFJ-x-ELIVMAJ-x-CSAVH-ELIVMa-D-x-ENSAE!- 
EHKR3-EVI3-X-ELY3- 

CONSENSUS: EVGAID-x-ELIVMH-x-ELIVMH-x ( M ) -F . 

30 NAME: Proliferating cell nuclear antigen signature 2- 

CONSENSUS: ERKAI-C-EDEJ-ERHl-x (3) -ELIVMFI-x < 3 > -ELIVMJ-x- 

ESGANH-ELIVMFJ-x-K- 

CONSENSUS: ELIVMFJ (2) - 

35 NAME: Actin-depolymerizing proteins signature- 

CONSENSUS: P-EDEI-x-ESAl-x-ELIVMTI-EKRI-x-EKRH-M-ILIVMl-EYA]]- 

ESTA3(3)-x(3)-ELIVMF3- 

CONSENSUS: EKR3- 

40 NAME : BCL2-like apoptosis inhibitors (spans part of BH3-. BH1 
and BH2) - 

NAME: Apoptosis regulator Bcl-2 family BH1 domain 
signature- 

45 CONSENSUS: ELVME3-EFTJ-x-EGSD3-EGL3-x(l-,2)-ENS]-|[YIJ3-G-R- 
ELIV3-EL IV C3- EG ATI- 
CONSENSUS: ELIVMF3(2)-x-F-EGSAE3-EGSARY3- 

NAME: Apoptosis regulator-i Bcl-5 family BH2 domain 

50 signature. 

CONSENSUS: U-ELIM3-x(3)-EGR3-G-EU(J3-EDENSAV]l-x-EFLGA3- 
ELIVFTO • 

NAME: Apoptosis regulator! Bcl-2 family BH3 domain 

55 signature- 

CONSENSUS: ELIVAT3-x(3)-L-EKARi33-x-EIVAL3-G-D-EDESG3-ELIMFV3- 

EDENSHfiH-ELVSHRdO- 

CONSENSUS: ENSR3- 
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NAME: Apoptosis regulators Bcl-2 family BH1 domain 

signature • 

CONSENSUS: EDS J-ENTI-R-EAEl-ELO-V-x-EKDl-EFYl-ELIVJ-EGHSl-Y- 

5 K-L-ESRJ-d-ERO-G- 

CONSENSUS: EHYJ-x-EOO- 

NAME: Apoptosis regulator Bcl-2 family BHM domain profile- 

10 NAME: Arrestins signature- 

CONSENSUS : EFY3-R-Y-G-x-OE3(2>-x-EDEJ-ELIVH3(£)-G-ELIVM3-x- 
F-x-ERO-CDEtfJ-CLIVrU- 

NAME: AAA-protein family signature. 

15 CONSENSUS: ELIVMTl-x-ELIVMTJ-ELIVMFJ-x-EGATMO-ESni-ENSl- 
x ( M ) -ELIVMJ-D-x-A-CLIFA J- 
CONSENSUS : x-R. 

NAME: Ubiquitin domain signature- 

20 CONSENSUS: K-x (2 ) -ELIVM3-X-EPESAKJ-X (3) -ELIVMJ-EPA J-x (3) -<2-x- 

ELIVMl-ELIVMO- 

CONSENSUS: ELIVMFY]I-x-G-x(M)-Cl>E]|. 
NAME: Ubiquitin domain profile- 

25 

NAME: ADP-ribosylation factors family signature- 

CONSENSUS: CHR(3T3-x-0:FYIdIl-x-ILIVM3-x(M)-A-x(2)-G-x(2)- 
ELIVM3-x(2)-EGSA3-ELIVMF3-x- 
CONSENSUS: EldO-ELIVMJ - 

30 

NAME: GTP-binding nuclear protein ran signature. 

CONSENSUS : D-T-A-G-fl-E-K-ELFJ-G-G-L-R-EDEJ-G-Y-Y. 

NAME: SARI family signature- 

35 CONSENSUS: R-x-ILIVM3-E-V-F-M-C-S-ELIVM3(2)-x-IKRfl]l-x-G-Y-x- 
E-EAGI-EFIJ-x-U-ELIVM]!- 
CONSENSUS: x-(3-Y- 

NAME: Band 7 protein family signature- 

40 CONSENSUS: R-x(2)-ELIV3-ESAN3-x(t)-ELIV3-D-x(2)-T-x(2)-U-G- 
ELIV3-EKRH3-ELIV3-X- 

CONSENSUS: EKR3-ELIV3-E-ELIVJ-EKR3 • 

NAME: Trp-Asp <UD> repeats signature- 

45 CONSENSUS: ELIVMSTACJ-ELIVMFYUSTAGC3-ELIMSTAG3-ELIVMSTAGC3- 
xC2)-EDNI-x(2>- 

CONSENSUS: ELIVMUSTAC3-X-ELIVMFSTAG3-U-EDEN3-ELIVMFSTAGCN3 - 

NAME: G-protein gamma subunit profile- 

50 

NAME: Ras GTPase-activating proteins signature- 

CONSENSUS: E6SN3-x-ELIVMF3-EFY3-ELIVMFY3-R-ELIVMFY3(2)- 
EGACNJ-P-EA VJ-ELIVI ( 2 ) - 
CONSENSUS: ESGAN3-P- 

55 

NAME: Ras GTPase-activating proteins profile- 
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NAME: Guanine-nucleotide dissociation stimulators CDC24 

family signature- 

CONSENSUS: L-xC2)-ELIVMFYU3-L-xC2)-P-0-IVM3-x(2)-n:LIVM3-x- 

EKRS3-x(2)-L-x-ELIVM3-x- 

CONSENSUS: OE<23-ELIVM3-x ( 3) -EST3 - 

NAME: Guanine-nucleotide dissociation stimulators CDC25 

family signature- 

CONSENSUS: EGAP3-ECT3-V-P-EFY3-x(H)-ELIVHFYl-x-IDNJ-ELIVM]). 

NAME: MARCKS family signature 1. 

CONSENSUS: G-C5-E-N-G-H-V-EKR3 • 



NAME: MARCKS family phosphorylation site domain- 

15 CONSENSUS: E-T-P-K(S)-x(D-.l)-F-S-F-K-K-x-F-K-L-S-G-x-S-F-K- 
EKR3-ENS3-EKR3-K-E- 



NAME: Stathmin family signature 1- 
CONSENSUS: P-EKA3-EKR3 (2) -EDE3-X-S-L-EEG3-E. 

NAME: Stathmin family signature 2- 
CONSENSUS: A-E-K-R-E-H-E-EKR3-E-V. 



NAME: GTP-binding elongation factors signature- 
25 CONSENSUS: D-EKRSTGAN(2FYIi)3-x (3 ) - E-EKRAl23-x-ILRK<3I>3-EGC3- 

EIVMK3-EST3-EIV3-x(2)- 
CONSENSUS: EGSTACKRN(23 - 

NAME: Elongation factor 1 beta/beta ' /delta chain signature 
30 1- 

CONSENSUS: EI>E3-E1>EG1-EDE3(2)-ELIVMF3-D-L-F-G. 

NAME: Elongation factor 1 beta/beta ' /delta chain signature 
2- 

35 CONSENSUS: V-<2-S-x-D-ILIVM3-x-A-EFWM3-ENfl3-K-ELIVM3 . 

NAME: Elongation factor 1 gamma chain profile- 

NAME: Elongation factor Ts signature 1- 
40 CONSENSUS: L-R-x (2) -T-EGI><0-x-EGS3-ELIVI1F3-x<D-,:L)-EDENKAC3-x- 

K-EKRNE(3S3-EAV3-L. 



NAME: Elongation factor Ts signature 2- 

CONSENSUS: E-ELIVM3-N-ESCV3-E<2E3-T-I>-F-V-|]:SA3-inKRN3 . 

NAME: Elongation factor P signature- 

CONSENSUS: K-x-A-x ( H ) -G-x (2) -ELIVH-x-V-P-x (2) -ELIV3-x(2) -G- 



NAME: Eukaryotic initiation factor 1A signature. 

50 CONSENSUS: EIM3-x-G-x-EGS3-EKRH3-xCil)-li:CL3-x-D-G-x<2)-R-x(2>- 
ERHJ-I-x-G. 

NAME: Eukaryotic initiation factor ME signature. 

CONSENSUS: OE3-EIFY3-x(2)-F-EKR3-x(2)-ELIVM3-x-P-x-U-E-EDV3- 
55 x(S)-G-G-EKR3-U. 

NAME: Eukaryotic initiation factor SA hypusine signature- 

CONSENSUS: EPT3-G-K-H-G-X-A-K ■ 
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NAME: Initiation factor 2 signature. 

CONSENSUS: G-X-ELIVM3-X ( 2 ) -L-EKR3-EKRHNS3-x-K-x ( 5) -ELIVM3- 

x (2) -G-X-IDEN3-C-G . 

5 

NAME: Initiation factor 3 signature. 

CONSENSUS: EKR3-ELIVM3 ( 2) -CDN3-EFY3-EGSN3-EKR3-ELIVI1FYS3-X- 

EFY3-EDE(2T3-x ( 2) -EKR3 . 

10 NAME: Translation initiation factor SUI1 signature. 
CONSENSUS: ELIVM3-EE(33-ELIVM3-l2-G-EI>EN3-EKH<23-EKRV3 • 

NAME: Prokaryotic-type class I peptide chain release factors 
signature. 

15 CONSENSUS: EAR3-ISTA3-x-G-x-G-G-(2-EHNGCS3-V-N-x(3)-EST3-A- 
EIV3. 



20 



NAME: Transcription termination factor nusG signature. 

CONSENSUS: ELIVM3-F-G-EKRU3-X-T-P-EIV3-X-ELIVM3. 

NAI1E: Calponin family repeat- 

CONSENSUS: ELIVM3-x-ELS3-<3-EMAS3-G-ESTY3-ENT3-EKR<23-x (2) - 
ESTN3-(2-x-G-x(3-,M)-G- 

25 NAME: CAP protein signature 1- 

CONSENSUS: ELIVM3 (2) -X-R-L-EDE3-X CO-R-L-E • 



30 
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NAME: CAP protein signature 2. 

CONSENSUS: D-ELIVMFY3-x-E-x-EPA3-x-P-E-<2-CLIVI1FY3-IC. 
NAME: Calreticulin family signature !• 

CONSENSUS: EKRHN3-x-EDEl2N3-EDE(2NK3-x(3>-C-G-G-EAG3-EFY3- 
ELIVM3-EKN3-ltLIVMFY3(2> - 



35 NAME: Calreticulin family signature 2. 

CONSENSUS: ELIVM3(2)-F-G-P-D-x-C-CAG3. 



NAME: Calreticulin family repeated motif signature- 
CONSENSUS: EIV3-x-D-x-EDENST3-x(2)-K-P-EI>EH3-D-U-EDEN3. 

NAME : Calsequestrin signature 1- 

CONSENSUS: EE(23-EDE3-G-L-EDN3-F-P-x-Y-D-G-x-]>-R-V. 



NAME: Calsequestrin signature 2- 

45 CONSENSUS: OE3-L-E-D-U-ELIVM3-E-D-V-L-X-G-X-ELIVM3-N-T-E-D- 

J>-J>. 

NAME: S-100/ICaBP type calcium binding protein signature. 

CONSENSUS: ELIVMFYIO(2)-x(2)-ELK3-D-x(3)-EDN3-x(3)-El>NSG3- 
50 EFY3-x-EES3-EFYVC3-x(2)- 

C0NSENSUS: ELIVMFS3-ELIVMF3 • 



NAME: Hemolysin-type calcium-binding region signature. 

CONSENSUS : D-x-ELI3-x(il)-G-x-]>-x-ELI3-x-G-G-x(3)-I>. 

NAME: HlyD family secretion proteins signature. 

CONSENSUS: * ELIVM3-xC2)-G-ELM3-xC3)-ESTGAV3-x-ELIVMT3-x- 
ELIVMT3-EGE3-x-EKR3-x- 

-473- 



WO 01/98454 PCT/IB01/02050 
CONSENSUS: ELIVMFYIO ( 5 ) -X-ELIVMFYW3 (3) - 

NAME: P-II protein urydylation site- 
CONSENSUS: Y-EKR3-G-EAS3-EAE3-Y . 

5 

NAME: P-II protein C-terminal region signature- 
CONSENSUS : EST3-x(3)-G-EDY3-G-CKR3-EIV3-EFtO-ELIVM3-x<2>- 

elivmj- 

10 NAME: m-3-3 proteins signature 1- 

CONSENSUS: R-N-L-ELIV3-S-EVG3-EGA3-Y-EKN3-N-EIVA3 - 

NAI1E: m-3-3 proteins signature 2- 

CONSENSUS: Y-K-EDE3-S-T-L-I-EIM3-<2-L-ELF3-ERHC3-]>-N-ELF3-T- 
15 ELS3-U-ETAN3-ESAD3- 

NAME: ATP1G1 / PLM / MATS family signature- 

CONSENSUS: E»NS3-x-F-x-Y-D-x(2>-EST3-ELIVM3-ER<23-x<2)-G. 

20 NAME: BTG1 family signature 1- 

CONSENSUS: Y-x (2>-EHP3-li)-EFY3-EAP3-E-x-P-x-K-G-.x-EGA3-EFY3-R- 

C-EIV3-ERH3-EIV3- 

NAME: BTG1 family signature 2- 

25 CONSENSUS: ELV3-P-x-EDE3-ELM3-EST3-ELIVM3-U-EIV3-D-P-x-E-V- 
ESC3-x-ER<23-x-G-E- 

NAME: Cullin family signature- 

CONSENSUS: ELIV3-K-x(2)-ELIV3-x(2)-L-I-EDE<23-EKRHN<0-x-Y- 
30 ELIVM3-x-R-x(b-.7)-EFY3-x- 
CONSENSUS: Y-x-ESA3>- 

NAME: Cullin family profile. 

35 NAME: Enhancer of rudimentary signature- 

CONSENSUS: Y-D-I-ESA3-x-L-EFY3-x-F-EIV3-]>-x<3) -D-ELIV3-S - 
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NAME: G10 protein signature 1- 

CONSENSUS: L-C-C-x-EKR3-C-x(iO-EDE3-x-N-x(M>-C-x-C-R-V-P. 

NAME: G10 protein signature 2- 
CONSENSUS: C-x-H-C-6-C-EKRH J-G-C-ESA3 - 



NAME: Glucokinase regulatory protein family signature- 

45 CONSENSUS: G-EPA3-E-X-ELIV3-ESTA3-G-S-EST3-R-ELIVM3-K- 
ESTGA3C3)-x(2)-K- 



NAME : GTP1/0BG family signature- 

CONSENSUS: D-ELIVM3-P-G-ELIVM3 (2 ) -EDEY3-EGN3-A-X (2 ) -G-x-G - 

NAME: HIT family signature- 

CONSENSUS: ENflA3-x(1) -EGAV3-x-E(2F3-x-ELIVM3-x-H-ELIVMFYT3-H- 

ELIVMFT3-H-ELIVMF3(2)- 

CONSENSUS: EPSGA3- 

NAME: Caseins alpha/beta signature- 

CONSENSUS: C-L-ELV3-A-X-A-ELVF3-A - 
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NAIIEs Clathrin adaptor complexes medium chain signature 1. 

CONSENSUS: EIVT3-EGSP3-UI-R-x(2-.3)-EGAD3-x(2)-EHYl-x(2)-N-x- 
ELIVMAFY3(3)-I>-ELIVM3- 

CONSENSUS: ELIVMT3-E. 

5 

NAME: Clathrin adaptor complexes medium chain signature 2. 

CONSENSUS: ELIV J-x-F-I-P-P-x-G-x-ELIVMFY J-x-L-x (2) -Y • 

NAME: Clathrin adaptor complexes small chain signature. 

10 CONSENSUS: ELIVM3 (2 ) -Y-EKR3-X ( H ) -L-Y-F . 

NAME: Ependymins signature 1* 

CONSENSUS: F-E-E-G-x-CLIVMF3-Y-EED3-I-I>-x(2)-N-E<2E3-S-C- 
ERKH3(2) • 

15 

NAME: Ependymins signature 2- 

CONSENSUS: E<2E3-ELIVMA3-F-x(2)-P-ESTA3-EFY3-C-EDE3-EGA3- 
ELIVM3-x(2)-EI>E3(2) . 

20 NAME: Syntaxin / epimorphin family signature. 

CONSENSUS: ER<23-x (3>-ELIVMA3-x (2>-ELIVM3-CESH3-x(2)-ELIVMT3- 

x-EDEVM3-[i:LIVM3-x(2)- 

CONSENSUS: ELIVM3-EFS3-x(2)-ILIVM3-x(3)-ELIVT3-x(2>-(2- 
EGADE<23-x(2)-ELIVM3-EDN<2T3-x- 
25 CONSENSUS: CLIVMF3-EDESV3-x(2)-li:LIVM3. 

NAME: Extracellular proteins SCP/Tpx-l/AgS/PR-l/Sc? 

signature 1. 

CONSENSUS: EGDER3-H-EFYIi)H3-T-C!-ELIVM3 (2) -U-x (2) -ESTN3 • 

30 

NAME: Extracellular proteins SCP/Tpx-l/Ag5/PR-l/Sc7 

signature 2. 

CONSENSUS: ELIVMFYH3-ELIVMFY3-x-C-EN(3RHS3-Y-x-EPARH3-x-EGL3- 
N-ELIVMFYWDN3. 

NAME: Fetuin family signature 1. 

CONSENSUS: C-x (St. ) -C-x (10) -C-x (13)-C-x (17-.1A ) -C-x (13 )-C-x (2)- 

C-x(5fi)-C-x(10-.ll>- 

CONSENSUS: C-x(10-.12)-C-x(lb-.22)-C 

NAME: Fetuin family signature 2. 

CONSENSUS: L-E-T-x-C-H-x-L-D-P-T-P. 

NAME: Legume lectins beta-chain signature. 

45 CONSENSUS: ELIV3-ESTAG3-V-EDEI2V3-EFLI3-D-EST3 • 

NAME: Legume lectins alpha-chain signature. 

CONSENSUS : ELIV3-X-EEDO-EFYUKR3-V-X-ELIV3-G-ELF3-EST3 . 

50 NAME: Vertebrate galactoside-binding lectin signature. 

CONSENSUS: U-EGEK3-x-EE(2]l-x-IKRE3-x(3-,ti)-EPCTF3-ILIVMF]l- 
EN(2EGSKV3-x-EGH3-x(3)- 
C0NSENSUS: EDENKHS3-ELIVMFC3 • 

55 NAME: Lysosome-associated membrane glycoproteins duplicated 

domain signature. 

CONSENSUS : ESTA3-C-ELIVM3-ELIVMFYU3-A-x-ELIVMFYU3-x(3)- 
CLIVMFYU3-x(3)-Y. 
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NAME: LAMP glycoproteins transmembrane and cytoplasmic 

domain signature- 

CONSENSUS: C-x (2 )-D-x C3-.M ) -CLIVPO (S) -P-ELIVMH-x-ELIVMJ-G- 

5 x(2)-0XIVM:D-x-G-ELIVM]!(2)- 

CONSENSUS: x-ELIVMH ( M >-A-EFY31-x-ELIVM3-x (2 ) -EKRJ-ERH3-X (1-.2) - 

ISTAGIDC2)-Y-CE<n. 

NAME: Glycophorin A signature. 

10 CONSENSUS: I-I-x-EGACJ-V-fl-A-G-ELIVrO (2 ) - 

NAME: PMP-22 / EMP / MP20 family signature 1 • 

CONSENSUS: ELIVMF3 (1 ) -ESAU-T-x (2) -EDNKS3-x-U-x (1il3>-ELIVJ-lil- 

x(2)-C 

15 

NAME : PMP-22 / EMP / MP2D family signature 2- 
CONSENSUS: ERflD-EAVJ-x-M-EIVl-L-S-x-ELIJ-x (M)-EGSAJ- 

ELIVMF3 (3) . 

20 NAME: Oxysterol-binding protein family signature. 
CONSENSUS: E-EKO-x-S-H-EHRl-P-P-x-ESTACFU-A • 

NAME: Yeast PIR proteins repeats signature- 

CONSENSUS: S-(3-EIV]l-ESTGNHl-D-G-l3-ELIV]l-i2-EAIV]l-ESTA3l - 

25 

NAME: Seminal vesicle protein I repeats signature- 

CONSENSUS: EIVMl-x-G-(2-D-x-V-K-x(S)-EKN3-G-x(3)-ESTLV]I. 

NAME: Seminal vesicle protein II repeats signature- 

30 CONSENSUS: EGSAl-d-x-K-S-EFYH-x-fl-x-K-ESA]] . 

NAME: Serum amyloid A proteins signature- 

CONSENSUS: A-R-G-N-Y-EEDl-A-x-EflKRJ-R-G-x-G-G-x-hl-A . 

35 NAME: Spermadhesins family signature 1- 

CONSENSUS: C-G-x (2) -ELIJ-x (1) -G-x-I-x H) -C-x-U-T. 

NAME: Spermadhesins family signature 2- 

CONSENSUS: C-x-K-E-x-ELIVMI-E-ELIVMH-x-EDEJ-x (3)-EGS3-x ( 5) -K- 

40 x-O 

NAME : Stress-induced proteins SRP1/TIP1 family signature. 

CONSENSUS: P-U-Y-EST3(2)-R-L- 

45 NAME: Glypicans signature- 

CONSENSUS: C-x (2) -C-x-G-ILIVMD-x ( H ) -P-C-x (2) -EFYJ-C-x ( 2 ) - 

ELIVM3-x(2)-G-C- 

NAME: Syndecans signature- 

50 CONSENSUS: EFY3-R-EIM J-EKRJ-K (2) -D-E-G-S-Y . 

NAME: Tissue factor signature- 

CONSENSUS: U-K-x-K-C-x(2)-T-x-EDEN3-T-E-C-D-ELIVM3-T-D-E. 

55 NAME: Translationally controlled tumor protein signature 1. 

CONSENSUS: EIA3-G-EGAS3-N-CPA3-S-A-E-CGI>E3-EPAGE3-x (0 •, 1 ) - 

EDEG3-x-EDEN3-x(2)-EDEll. 
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NAME: Translationally controlled tumor protein signature 2- 
CONSENSUS: EFL J-EFY J-EIVTJ-G-E-x-EMAJ-x (2-.S) -EDENJ-lGASl-x- 

ELVJ-EAVJ-x (3 ) -EFYID-EKRI- 
CONSENSUS: EDEH • 

5 

NAME: Tub family signature 1- 

CONSENSUS: F-EKH<23-G-R-V-ESTJ-x-A-S-V-K-N-F-l2 . 

NAME: Tub family signature 2- 

10 CONSENSUS : A-F-EAGl-I-ESACl-ELIVM-ESTl-S-F-x-ICGSTJ-K-x-A-C- 
E. 



15 



45 



NAME: HCP repeats signature- 
CONSENSUS: H-R-H-R-G-H-x(2)-E»E3(7) . 

NAME: Bacterial ice-nucleation proteins octamer repeat 
CONSENSUS : A-G-Y-G-S-T-x-T- 



NAME: Cell cycle proteins ftsU / rodA / spoVE signature. 

20 CONSENSUS: ENVID-xCS^EGTRJ-ELIVMAD-x-P-EPTLIVMl-x-G-ELIVMl- 
x(3)-ELIVMFU3(B)-S-EYSA3- 
CONSENSUS: G-G-ESTN3-CSA3 . 

NAME: Enterobacterial virulence outer membrane protein 

25 signature 1- 

CONSENSUS: G-ELIVMFYI-N-ELIVMl-K-Y-R-Y-E . 

NAME: Enterobacterial virulence outer membrane protein 

signature 2* 

30 CONSENSUS: EFYWJ-x(2)-G-x-G-Y-EKR:D-F>. 

NAME: Hydrogenases expression/synthesis hypA family 
signature- 

CONSENSUS: F-ECSA3-EFY3-EDEJ-ELIVA]! (2) -x (3) -ESTI-ELIVfO- 
35 xClb)-C-xC2)-C-xC12il5>- 

CONSENSUS: C-P-x-C- 

NAME: Hydrogenases expression/synthesis hupF/hypC family 
signature- 

40 CONSENSUS: <M-C-ELIV]l-EGA3-ILIV]|-P-x-E(3KR3-ELIV]I . 

NAME: Staphylocoagulase repeat signature- 

CONSENSUS: A-R-P-x(3)-K-x-S-x-T-N-A-Y-N-V-T-T-x(2)-EDNl-G- 
x(3)-Y-G- 

NAME: 11-S plant seed storage proteins signature- 

CONSENSUS: N-G-x-EDEJ (2 )-x-ELIVMF3-C-EST3-x (llil2) -EPAGI-D • 



NAME: Dehydrins signature 1- 
50 CONSENSUS: S(5) -E»E3-x-E])E3i-G-x (1 -.2) -G-x (0-.1) -EKR3 (4 ) - 

NAME: Dehydrins signature 2- 

CONSENSUS : EKR3-ELIMJ-K-EDEJ-K-ELIM3-P-G ■ 

55 NAME: Germin family signature. 

CONSENSUS: G-x (4 ) -H-x-H-P-x-A-x-E-ELIVrO - 

NAME: Oleosins signature- 
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CONSENSUS: EAGJ-CST3-x(S)-EAGJ-x(2)-CLIVI1:D-IESA]>:]I-T-P- 
ELIVMF3 (M)-F-S-P-ELIVM]) (3)- 
CONSENSUS: P-A- 

5 NAME: Small hydrophilic plant seed proteins signature. 

CONSENSUS: G-EEiO-T-V-V-P-G-G-T . 

NAME: Pathogenesis-related proteins Betvl family signature. 

CONSENSUS: G-x (E ) -ELIVMFJ-x (M ) -E-x <£ ) -CCSTAENI-x CA-, D -EGNDJ- 

10 G-EGS3-ECS3-x(E)-K-x(«4)- 
CONSENSUS: EFY3 - 
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NAME: Pollen proteins Ole e I family signature. 
CONSENSUS: EEfll-G-x-V-Y-C-D-T-C-R - 

NAME: Thaumatin family signature. 

CONSENSUS: G-x-EGF3-x-C-x-T-EGA]]-D-C-x(],-,E)-G-x<£i3)-C. 



NAHE: Mrp family signature- 

20 CONSENSUS : U-x CE>-ELIVM3-D-ELIVMY3 C4 ) -D-X-P-P-G-T-EGS3-D . 



NAME: Glucose inhibited division protein A family signature 
1 • 

CONSENSUS: C6S3-P-x-Y-C-P-S-CLIVri]|-E-x-K-CLIVI13-x-CKR3-F . 

NAME: Glucose inhibited division protein A family signature 
B • 

CONSENSUS: A-G-£3-x-fENT3-G-x(E)-G-Y-x-E-CSAG3(3)-C(2S3-G- 
ELIVMJ(E)-A-G-ELIVMT3-N-A. 

NAME: NOLl/NOPS/sun family signature- 

CONSENSUS: EFV3-D-EKRA3-ELIVMA3-L-X-D-EAV3-P-C-ESTJ-CGAJ - 



NAME: PET11S family signature- 

35 CONSENSUS:. EDNJ-x-EDNl-R-x^-P-L-ELIVJ-E-ELIVl-x-ESTl-x-P. 

NAME: Protein smpB signature- 

CONSENSUS: ETA3-G-ILIVM3-x-L-x-G-x-E-ELIVM]l-EKia3-ESA3-ICLIVM3 - 

40 NAME: Hypothetical cof family signature 1- 

CONSENSUS: ELIVFYAN3-ELIVMFA3-X ( E ) -D-ELIVMFl-INDl-G-T-ELIVl- 

ELVYJ-ESTANLM3 - 

NAME: Hypothetical cof family signature 2- 

45 CONSENSUS: ELIVMFC3-G-I>-EGSAN(2]l-x-N-D-x(3)-ELIMFY3-x(E)-EAV]|- 
x(E)-EGSCP]]-x(2)- 
CONSENSUS: ELMP]l-x<E)-EGAS3. 

NAME-' RI01/ZKb3£.3/MJ0»^4 family signature- 

50 CONSENSUS: ELIVM3-V-H-CGA3-D-L-S-E-EFY3-N-X-ELIVM3 • 

NAME: SUAS/yciO/yrdC family signature. 

CONSENSUS : ELIVMTA3 (3) -ELIVMFYC3-EPG3-T-EDE3-ESTA3-X-EFY1- 

EGA3-ELIVM3-CGS3 • 



NAME: Uncharacterized protein family UPF0DD1 signature. 

CONSENSUS: EFU3-H-EFM3-EIVl-G-x-ELIV3-(2-x-ENKRl-K-x(3)-ELIV3. 
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NAME: Uncharacterized protein family UPFQ0D3 signature. 

CONSENSUS : G-x-V-x ( E ) -ELIV3-X ( 3) -ESA3-X (t ) -D-x ( 3) -ELIVTJ (3 ) - 

P-N-x(5)-CLIVnF3(E)- 
CONSENSUS: x(S)-N. 

5 

NAME: Uncharacterized protein family UPFQQOM signature- 

CONSENSUS: ELIVMJ-x-ELIVMTS-x (E) -G-C-x (3) -C-ESTANJ-EFY3-C-X- 

ELIVni-x(H)-G. 

10 NAME: Uncharacterized protein family UPFDQ05 signature. 

CONSENSUS: G-ELIVM3 <E ) -ESA3-X ( 5-.fi ) -G-x CE) -ELIVM3-G-P-X-L- 

x(H>-ESAG3-x(ii-.b>- 

CONSENSUS: ELIVM3 (E ) -x (E) -A-x ( 3) -T-A-ELIVM3 (E) -F . 

15 NAME: Uncharacterized protein family UPFOOOb signature 1- 

CONSENSUS: ELIVMFY3<E)-D-ESTA3-H-x-H-ELIVMF3-in]>N]]. 



20 



NAME: Uncharacterized protein family UPFDOOb signature E. 
CONSENSUS: P-ELIVM3-X-ELIVM3-H-X-R-X-ETA3-X-EDE3. 

NAME: Uncharacterized protein family UPFOOOb signature 3. 
CONSENSUS: CLVSA3-ELIVA3-xCE)-ELIVM3-l[PS3-xC3)-L-li:LIVM3- 
ELIVMS3-E-T-D-X-P. 

25 NAME : Uncharacterized protein family UPFU007 signature- 
CONSENSUS: V-L-EIV3-H-D-EGA3-A-R ■ 
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NAME : Uncharacterized protein family UPFOOll signature- 
CONSENSUS: S-D-A-G-x-P-x-ELI V3-ESN3-D-P-G • 

NAME: Uncharacterized protein family UPFDD1E signature. 
CONSENSUS: EGTA3-X (E ) -OVT3-C- Y-D-ELIVM3-X-F-P-X (T) - G • 



NAME: Uncharacterized protein family UPFOD1S signature. 

35 CONSENSUS: OE3-ELIVMF3<3)-R-T-0:SG3-G-x(E>-R-x-S-x-CFY3- 
CLIVM3(£)-lil-<2. 



NAME: Uncharacterized protein family UPFDOlb signature. 
CONSENSUS: E-ELIVM3-G-I>-K-T-F-ELIVMF3<E>-A. 

NAME: Uncharacterized protein family UPFDD17 signature. 
CONSENSUS: D-x(fi)-EGN3-ELFY3-x(M)-EDET3-ELY3-Y-x(3)-EST]l- 
x(7)-EIV3-x(E)-EPS3-x- 

CONSENSUS: ELIVM3-x-ELIVM3-x(3)-EI>N3-D . 

NAME: Uncharacterized protein family UPFDOll signature- 
CONSENSUS: L-P-V-EVT3-ENdL3-F-EAT3-A-G-G-ELIV3-A-T-P-A-D-A-A- 
ELM3- 

50 NAME: Uncharacterized protein family UPFODED signature. 
CONSENSUS: D-P-ELIVMF3-C-G-ESTJ-G-X (3)-ELI3-E- 



NAME: Uncharacterized protein family UPFDDE1 signature- 
CONSENSUS: C-K-x(£)-F-xC4)-E-x(£E-.S3>-S-G-G-K-D. 

NAME: Uncharacterized protein family UPF00E3 signature. 
CONSENSUS: D-X-D-E-ELIV3-L-X (M )-V-F-x(3)-S-K-G. 
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NAtlEs Uncharacterized protein family UPFD0E4 signature- 

CONSENSUS: G-x-K-D-EKR3-x-A-ELV3-T-x-<2-x-ELIVF3-ISGC3 - 

NAME : Uncharacterized protein family UPF00E5 signature- 

CONSENSUS: D-V-ELIV3-x<S>-G-H-EST3-H-x(:LE>-ELIVMF3-N-P-G. 

NAME: Uncharacterized protein family UPFDDS7 signature- 

CONSENSUS: (3-ELIVM3-x-N-x-A-x-ELIVM3-P-x-I-x (b ) -ELIVfO-P-D-x- 

H-x-G-x-G-x<E)-EIV3-G- 

NAME: Uncharacterized protein family UPFDDE6 signature- 

CONSENSUS: . CGA3-EGS3-G-EGA3-A-R-G-x-ESA3-H-x-G-x ( 1 ) -EIV3-X- 
EIV3-D-x(E)-EGA3-G-x-S- 
CONSENSUS: x-G- 

NAME: Uncharacterized protein family UPFDDST signature- 

CONSENSUS: G-x <E) -ELIVMK E ) -x(E) -ELIVM3-X ( 4 ) -ELIVMJ-x (S) - 

ELIVM3 ( E ) -X-R-EFYU3 ( E ) -G- 
CONSENSUS: G-x(E)-ELIVM3-G- 

NAME: Uncharacterized protein family UPF0Q3D signature- 

CONSENSUS: EGA3-L-I-ELIV3-P-G-G-E-S-T-ESTA3 - 



NAME: Uncharacterized protein family UPFDD31 signature 1- 
25 CONSENSUS: ESAV1-EIVU3-ELVA1-ELIV3-G-EPNS3-G-L-EGP3-X- 
OEN(2T3- 

NAME: Uncharacterized protein family UPFDD31 signature E- 
CONSENSUS: EGA3-G-X-G-D-ETV3-ELT3-ESTAJ-G-X-ILIVM3 - 

30 

NAME: Uncharacterized protein family UPFD03E signature- 
CONSENSUS: Y-x CE) -F-CLIVMA3(E>-x-L-x (H)-G-x(E)-F-EE<23- 

ELIVMF3-P-ELIVM3- 

35 NAME: Uncharacterized protein family UPF0Q33 signature. 
CONSENSUS: L-ONJ-x (2) -ETAG3-X (2) -C-P-x-P-x-CLIVMJ . 

NAME: Uncharacterized protein family UPFDQ34 signature- 
CONSENSUS: ELIVM3-EDNG3-ELlVM3-N-x-G-C-P-x(3)-ELIVMAS(23-x(S)- 
40 G-ESAC3- 

NAME: Uncharacterized protein family UPFD03S signature- 
CONSENSUS: L-L-T-x-R-ESA3-x(3)-R-x(3)-G-x(3)-F-P-G-G. 

45 NAME: Uncharacterized protein family UPFD03b signature. 

CONSENSUS: H-x-S-G-H-IGA3-x(3)-El>E3-x<3)-ELM3-x(5)-P-x(3>- 
ELIVM3-P-X-H-G-EDE3 • 

NAME: Uncharacterized protein family UPF003A signature- 
50 CONSENSUS: G-x-ELI3-x-R-x<E)-L-x(4)-F-x(a)-ELIV3-x(5)-P-x- 
ELIV3- 

NAME: Uncharacterized protein family UPF0Q44 signature. 
CONSENSUS: L-EST3-x(3)-K-x(3)-EKR3-ESGA3-x-EGA3-H-x-L-x-P- 
55 ELIV3-x(S)-ELIV3-EGA3- 
C0NSENSUS: x(E)-G. 

NAME: Uncharacterized protein family UPFD047 signature- 
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CONSENSUS: S-X <2 ) -ELIVl-x-CLIVJ-x ( 2 ) -G-x ( ^4 ) -G-T-lil-fl-x-ELIVl . 

NAME; Uncharacterized protein family UPFD05M signature. 
CONSENSUS: H-EGSl-x-L-H-L-ILIl-G-EFYliO-D-H - 

5 

NAME: Uncharacterized protein family UPF0057 signature- 
CONSENSUS: ELIVll-x-fcSTAl-ELIVF]] (3 ) -P-P-ELIVAH-EGAl-EIVJ-x ) - 

EGKN3 - 

10 NAME: Hypothetical YERDS7c/yjjV family signature. 

CONSENSUS: P-EAT J-R-ESAJ-x-ELIVIIYJ-x (S ) -EAO-x-L-P-x ( H ) - 

ELIVPIH-E- 

NAME: Hypothetical hesB/yadR/yf hF family signature- 
15 CONSENSUS: F-x-CLIVMFY J-x-N-EPGJ-ENSO-x ( M ) -C-x-C-EGSI-x-S-F - 

NAME: Hypothetical yabO/yceC/sf hB family signature- 
CONSENSUS: ENHYJ-R-ELIJ-D-x (2) -T-ESTH-G-ELIVPIAJ-ELIVIIF]) (2) - 

ELIVNFGJ-ESGAO - 

20 



Deposit of Clones 

25 Each clone has been transfected into separate bacterial 

cells (E- coli) in the composite deposit- 

The clones are located and publically available from the 
Resource Center of the German Human Genome Project (Heubner Ueg 
. b-i mDST Berlin^ GERMANY) n from which each clone comprising a 

30 particular polynucleotide is obtainable. The Resource Center 

library numbers are slightly different that those presented herei 
but may be readily obtained by the following key or with the 
assistance of Resource Center personnel. 

The library name becomes a number: brain (hfbrE) becomes 

35 SL^i kidney (hfkdB) becomes Sbbn mammary carcinoma (hmcfl) 

becomes 727i testis (htes3) becomes mm amygdala (hamy2) becomes 
7bli melanoma (hmel2) becomes 7b2 and uterus (hutel) becomes 56b- 
Next-» the plate number is converted to two digits (e-g-i Ll 2" 
becomes U D2") and is moved behind the plate coordinate! and the 

40 underscore is dropped- The following examples are helpful: 

Listed Number Resource Center Number 

»KFZphamyE_10hl7 DKFZp7blH171D 

DKFZphfbr2_7fii21 DKFZp5bm217fl 

45 »KFZphfkd2_3kl DKFZp5bbK013 

DKFZphmcf l_lc23 DKFZp727C231 

»KFZhmel2_12jl DKFZp7b2 J0112 

DKFZphtes3_lbb5 DKFZpH3MBD51b 

»KFZphutel_17k7 DKFZp5flbK0717 
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The libraries were constructed using two commercially 
available vectors. The brain (hfbr2 designations) and kidney 
(hfkd2 designations) libraries utilize pAMP 1 from Life 
5 Technologies and are maintained in XL-EBlue (Strategene) h the 
amygdala (hamyE)-i testes (htes3) and melanoma (hmelE) libraries 
are constructed in pSPORTl-i also from Life Technologies-, and are 
maintained in DH10B (Lif eTechnologies) - In addition to the 
following techniques! consultation with the commercial literature 

10 available on these clones will make evident all of the 

housekeeping techniques needed to propagate and isolate the 
individual constructs- All inserts may be excised with a 
Notl/Sall digestion. Alternatively-! universal primersi flanking 
the cloning region-i may be used to amplify the inserts using PCR 

15 methods. 

Bacterial cells containing a particular clone can be 

obtained from the composite deposit as follows: 

An oligonucleotide probe or probes should be designed to the 

sequence that is known for that particular clone- This sequence 
20 can be derived from the sequences provided herein-) or from a 

combination of those sequences. Methods of probe design are 

presented below- 

Oligonucleotide probes may be labeled with - 32 P ATP 

(specific activity bODQ Ci/mmole) and 1H polynucleotide kinase 
25 using commonly employed techniques for labeling oligonucleotides. 

Other-i non-radioactive labeling techniques can also be used. 

Unincorporated label typically is removed by gel filtration 

chromatography or other established methods- The amount of 

radioactivity incorporated into the probe can be quantified by 
30 measurement in a scintillation counter. Preferably-i specific 

activity of the resulting probe generally should be approximately 

HXIO 15 dmp/pmole- 

The bacterial culture containing the pool of full-length 

clones should preferably be thawed and 100 1 of the stock used 
35 to inoculate a sterile culture flask containing E5 ml of sterile 

L-broth containing ampicillin at SO - 100 g/ml (for XL-EBlue 

strains ES g/ml tetracycline should also be used). The culture 

should preferably be grown to saturation at 37°C--i and the 

saturated culture should preferably be diluted in fresh L-broth. 
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Aliquots of these dilutions should preferably be plated to 
determine the dilution and volume which will yield approximately 
SDDD distinct and well-separated colonies on solid 
bacteriological media containing L-broth containing ampicillin at 
5 10D g/ml (for XL-EBlue strains ES g/ml tetracycline should also 
be used)and agar at l-SX in a ISO mm petri dish when grown 
overnight at 37 0 O Other known methods of obtaining distinct-! 
well-separated colonies can also be employed. 

Standard colony hybridization procedures should then be used 

10 to transfer the colonies to nitrocellulose filters and lyse-i 

denature and bake them. The filter is then preferably incubated 
at b5 0 0 for 1 hour with gentle agitation in b x SSC (ED x stock 
is 17S.3 g NaCl/liter-i fifl-E g Na citrate/1 iter ■» adjusted to pH 
7-D with NaOH) containing O-S/C SDSn 100 g/ml of yeast RNA-i and 

15 ID mil EDTA (approximately ID mL per 1SD mm filter). Preferably-i 
the probe is then added to the hybridization mix at a 
concentration greater than or equal to 1X10 13 dpm/mL. The filter 
is then preferably incubated at b5°C. with gentle agitation 
overnight. The filter is then preferably washed in 50D mL of E x 

20 SSC/O-S* SDS at room temperature without agitationn preferably 
followed by 500 mL of E x SSC/D-IX SDS at room temperature with 
gentle shaking for IS minutes. A third wash with 0-1 x SSC/O-SX 
SDS at bS°C- for 3D minutes to 1 hour is optional. The filter is 
then preferably dried and subjected to autoradiography for 

25 sufficient time to visualize the positives on the X-ray film- 
Other known hybridization methods can also be employed. 

The positive colonies are pickedi grown in culture-i and 
plasmid DNA isolated using standard procedures. The clones can 
then be verified by restriction analysisi hybridization analysis-i 

30 or DNA sequencing. 

Alternatively-* clones may be grown as described above-i and 
PCR used to isolate the insert DNAs - Methods of PCR are 
described below and are otherwise well known . 

ERROR SCREENING 

35 The DNA sequences found herein derive from individual clonesn 

which are publicly available! as noted above- Thus-i the skilled 
artisan will recognize that any specific sequence disclosed herein 
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readily can be screened for errors by resequencing a particular 
fragment -i in both directions (i.e.-i by sequencing both strands) - 
Alternatively-i error screening can be performed by amplifying 
and/or cloning any of the inventive DNAs-i using for example RT- 
5 PCRn and sequencing the resulting amplified product- In the 
event that there is a sequencing error-i reference should be made 
to the deposited clone as the correct sequence. 

USES AND BIOLOGICAL ACTIVITIES OF THE INVENTIVE MOLECULES 

The inventive molecules and their derivatives are susceptible 

10 to a wide variety of usgsi based on functional and/or structural 
properties. The skilled worker will appreciate-i based on the 
biological activities detailed belowi and discussed with regard to 
the individual sequences herein i that the inventive molecules will 
find usefulness in numerous therapeutic and diagnostic 

15 applications. 

The DNA moleculesn especially the potassium salts thereof-* 
can be used as fertilizer supplements due to their high nitrogen 
and phosphorus contents- Since the DNAs are of defined lengthn 
they are also useful in gel electrophoresis as molecular weight 

20 markers. Due to their similarity with known molecules-i certain of 
the DNA molecules and their variants and derivatives may be used 
in any number of different diagnostic procedures and therapeutic 
applications. They may also be used to make the encoded proteins- 
The proteins themselves have many possible uses- They may be 

25 used as a nutritional supplement for humansi animals and even for 
laboratory use asi for example-* medium for bacterial cultures- 
lloreoverT since the proteins are of definedn known sizesi they may 
be used as molecular weight markers for gel electrophoresis and 
gel filtration. Because they are of defined sequencesn they also 

30 have use in microsequencing and protein fingerprinting 
applications. 

Expression Profiling Applications 

Given their known tissue expression and functional 
associationsn assemblages of the inventive proteins (or 
35 corresponding antibodies) and nucleic acids are particularly 
suited to expression profiling applications. Expression profiling 
generally entails constructing an array of indicators that signal 

-484- 



WO 01/98454 PCT/1B01/02050 
the presence of a particular RNA or protein expression product. 
Such arrays can be used to evaluate-, for example-i pharmacological 
effectiveness and toxicity. In particular expression profiles 
from such arrays can be generated from cells treated with known 
5 compounds-, having known properties-! and these profiles can be 
compared to profiles of unknowns to evaluate similarities and 
differences! which can be correlated with efficacy or toxicity. 

Additional uses of profiling include diagnosis-, tracking 
development and ascertaining signaling and metabolic pathways- 
10 For examples of references describing profiling and its usesi 

see Farr et al.i U-S- Patent 5-.ail-.231 (man Seilhamer et al.-. 
U.S. Patent S^D-i^ (mfl)! Rine et al.-i U-S- Patent No- 

s-,7?7-.aaa (man uo t?/h?317* uo ^/ositt^ uo ii/maias and uo 

TT/lMBtiT. For a device for implementing such techniques-i see 

15 Lipshutz et al.i U-S- Patent No- 5-ia5b-.17i* (IW) and Anderson et 
al.-. U-S- Patent No. 5-, ^-.5=11 CmT). 

In one embodiment-, a subset of the inventive DNAs will be 
arrayed on a substrate-, like a gene chip-i a filter or a Tb-well 
plate. Test samples containing cells are maintained in the 

20 presence of a label capable of incorporation into nascent mRNA. 
Samples are treated with test and control compounds-, which will 
induce mRNA expression in the sample-, resulting in incorporation 
of label- Whole mRNA is isolated and applied to the array such 
that it . hybridizes with the DNAs contained therein- After 

25 washing-, the amount of hybridization is quantified and a profile 
is generated- These steps are repeated with various control and 
test compounds-, thereby generating a library of profiles-i which 
can be used to ascertain the relationships relevant to 
pharmacological efficacy or toxicity. 

30 The matrices used in such profiling-^ howevern need not be 

limited to those utilizing DNAs. Rather-, other nucleic acids-i 
like RNAs and protein nucleic acids (PNAs)-. as well as the 
inventive proteins and antibodies corresponding to the inventive 
proteins may also be employed. Hencei for example-i antibodies 

35 could form the array and the samples could be treated in order to 
label nascent proteins- Whole proteins then would be isolated and 
applied to the antibody matrix- Developing the resulting signal 
would result in a protein expression profile-* which is useful in 
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essentially the same manner as the nucleic acid profile- A 
protein matrix could be usedi for example-i in evaluating antibody 
responses to pharmaceutical agents in order to eliminate possible 
cross-reactivity- 

Moreover-! where nucleic acids are used in the matrix-i it is 
often beneficial to use variants (as defined below) of the 
molecules described hereinin. This can be used to account for 
genetic variations that are of little or no consequence to the 
function of the resultant gene product- Hencei they can account 
for wobble or conservative amino acid variations that do not 
perturb function-i like variations in some of the protein' motifs 
elucidated below. Thus-i each position in the matrix can employ 
multiple nucleic acid probes that account for a series of 
variants. 

Expression profiling may also be done-i in another embodiment-i 
using two-dimensional protein gels in which the inventive proteins 
are detected- The resultant profiles can be used in the same way 
as described. 

Matrices useful for profiling may be constructed based on 
different criteria- Of coursei the more relevant profiles will 
take into account expression of most human genes-i preferably all 
of them- In certain situations! however-i it is advantageous to 
look at a smaller subset. For example-i if one were concerned 
about fetal neural toxicity-i a fetal brain-specific matrix might 
be chosen- On the other handi if one were interested in targeting 
mammary carcinoma tissuen a corresponding matrix could be used. 
Thus i matrices may be constructed using all of the sequences 
available from a tissue-specific library. 

* * * 

The following discussion relates to some of the various 
functional and structural groupings that would be of interest to 
the artisan wishing to construct profiling matrices- Of coursei 
the artisan will also recognized that these functional 
descriptions may find additional applicability in the therapeutic 
and diagnostic applications discussed below- 

Cell Cycle 

A proliferating cell must coordinate replication and 
chromosomal separation to ensure that the genome is replicated 
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completely-i and that a single copy is correctly inherited by each 
daughter cell. The cell cycle is the coordinated series of events 
that achieves these aims- Many of the key events are initiated by 
a family of conserved Seiren/threonine protein kinases-i the . 
5 cyclin-dependent kinases (CDKs)-i that are activated by the 

cyclin family of proteins (cyclins A-H) - In turni the cyclin-CDK 
complexes are modulated by other protein kinases or phosphatases n 
and by binding specific inhibitor proteins- The enormous variety 
of ways in which CDK activity can be regulated allows the cell 
10 to respond to internal signals generated by preceding events in 
the cell cycle and to external growth signals- 

The somatic cell cycle is divided into four phases: DNA 
replication (S phase) and chromosome separation (fl phase) are 
separated by gap phases (Gl and GB). At specific control points 
15 the decision to begin the next stage (DNA synthesis or mitosis) 
is carefully regulated- 

CdcE-i the primary kinase-i is especially required for the Gl- 
S transition and S phase- Cdc*4 and Cdcb are involved at the 
restriction point-i where the cell can decide to proliferate or 
20 arrest (GK->C0) and Cdc7 is a CDK activating kinase (CAK) as 
well as a subunit of TFIIH - 

The Cyclin-CDK complexes are regulated in various ways- One 
is through phosphorylation by CDK activating kinases (CAK)-i like 
the Y1S kinase (Ueel) and dephosphorylation by CDK associated 
25 phosphatases (CAP)-, like CdcSSA a member of the CdcSS family 
(CdcBSA-. B and O- 

An other way of regulation occurs through two classes of CDK 
inhibitors (CKI)i the INKM proteins plSi plb-i plfli and plT-i who 
negatively regulates the cyclin D CDK complexes and second the 
30 pBl family with pBln pB7i and p57- 

The cell cycle is also regulated through ubiquitin-mediated 
proteolysis involving the destruction of both cyclins and CDK 
inhibitors by the BbS proteasome-t that requires an ubiquitin 
conjugating enzyme (UBC) and an ubiquitin ligase- The instability 
35 is conferred by PEST regions (cyclin D and E) or a ten amino acid 
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region in the amino terminus (degradation box) in the A- and B- 
type cyclins- 



All these modifications play an important role for the 
cellular localization! because only the nuclear CDK-cyclin 
5 complexes are functional for cell cycle- During Gl phase of the 
cell cycle! cyclines An E and D are synthesized and bind to their 
cyclin-dependent kinase (CDK) partners- CDK complexes containing 
cyclins At E and Dl are then imported into and concentrated 
within nuclei. Cdkb- cyclin D3 has been localized to both 

10 cytoplasmic and nuclear compartments! although only the nuclear 
complex is active- As cells enter S phase-i cyclin A and cyclin E 
complexes remain within the nucleus! whereas cyclin Dl 
relocalizes to the cytoplasm for proteolysis at the onset of S 
phase- Like CdkE-cyclin At CdcE-cyclin A is nuclear and remains 

15 so until it is degraded during mitosis. By contrast! as a result 
of ongoing nuclear import and more rapid re-export! cyclin Bli 
which binds to CdcE upon synthesis during S phase! is 
predominantly cytoplasmic. CdcE-cyclin BE is also cytoplasmic! 
although this might occur through anchoring of the complex to 

20 some cytoplasmic constituent- At prophasen phosphorylation of 

cyclin Bl promotes accumulation of CdcE-cyclin Bl in the nucleus! 
whereas cyclin BE remains in the cytoplasm until nuclear envelope 
breakdown- 
Two crucial regulators of CdcE-cyclin B-Weel and CdcESC 

25 exist and are responsible for the GE to M control point- bleel is 
a nuclear protein throughout the cell cycle! whereas CdcESC binds 
to 14-3-3 proteins during interphase and remains predominantly 
cytoplasmic. In some systems CdcESC! like cyclin Bl! rushes 
precipitously into the nucleus just before entry into mitosis- 

30 The 110-kDa retinoblastoma (tumor suppressor) protein ( RB ) t 

a pRB-family member is an important regulator of cell-cycle 
progression and differentiation. Like the EEF family (EEF1-5) or 
DP family (DP1-3) of transcription activators! RB suppresses 
inappropriate proliferation by arresting cells in Gl by 

35 repressing the transcription of genes required for the transition 
into S phase- Before the cell proceeds into S phase! RB becomes 
phosphorylated at multiple sites by the cyclin dependent protein 

-488- 



WO 01/98454 PCT/IB01/02050 
kinases (CDKs) and loses its transcriptional repressing activity. 
Phosphorylation of RB during late Gl phase results in the 
dissociation of the EEF-RB repressor complex which allows S-phase 
specific genes to be transcribed- Cyclin E is the evolutionary 
5 conserved target for EEF and interacts together with CDCE in late 
61- 

For a proliferating cell it is vital that only undamaged DNA 
is replicated because if DNA damage is substantial «i its 
replication can lead to chromosome loss or rearrangement- Thus-i 
10 we find a GK->S checkpoint in late Gl that requires tumor 
suppressor pS3- A p53-dependent Gl arrest is effected by the 
cyclin dependent kinase inhibitor pSl through higher expression 
levels that inhibits almost all cyclin CDK complexes- 

The kinase responsible for phosphorylating the unidentified 
15 kinetochore component in metaphase may be a member of the HAP 
kinase family and appears to be the proto oncogene c-NOSt a 
cytostatic factor (CSF) in meiosis- 

Several categories of proteins are coded for by clones of 
the invention within the overall group of u Cell cycle^and 
20 include-i among othersn the following: 



PAEb-TE protein: PAEb-TE is a p53 responsive gene- The protein is 
predominantly expressed in brain-* breast and kidney and 
represents a novel regulator of cellular growth- Isoforms are 

25 differentially induced by genotoxic stress (UVi gamma-irradiation 
and cytotoxic drugs)in a pS3-dependent manner- The p53 tumor 
antigen is found in increased amounts in a wide variety of 
transformed cells- The protein is also detectable in many 
actively proliferating! nontransf ormed cells-i but it is 

30 undetectable or present at low levels in resting cells- PS3 is 
postulated to bind as a tetramer to a p53-binding site (PBS) and 
to activate the expression of adjacent genes that inhibit growth 
and/or invasion- Deletion or inactivation of one or both pS3 
alleles reduces the expression of tetramersi resulting in 

35 decreased expression of the growth inhibitory genes- This 

mechanism is found in tumors of several types- (ONIN *nil7Q) 
Clones in this category include: amyE_lElmE 
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Cell structure and motility 

One of the major differences between prokaryotes and 
eukaryotes is the ability of the eukaryotic cell to adopt very 
5 different shapes dependent on its function during the 

differentiation process- Animal cells vary from being round to 
extended cylindric forms like motorneurons or muscle cells. In 
humansi more than 100 different cell types can be distinguished-! 
each having a characteristic shape. The form of a cell often is 

10 closely related to its capacity to move- Some completely 

differentiated cells like fibroblasts can still change their form 
actively! thereby migrating. Other cell types serve as motor 
elements - "macroscopically" like muscle cells or 
"microscopically 11 like ciliated epithelia. Such tasks are 

15 fulfilled by a big class of proteinsi on the one hand responsible 
for maintenance of cell structure and contacting neighbor cells 
or the intercellular matrix and on the other hand for cell 
motility. These topics cannot be regarded separately: The 
motility apparatus e.g. must be fixed in the cytoskeleton . Three 

20 different types of filaments can be distinguished: Actin 

filaments! tubulin filaments and intermediate filaments-i each 
present in almost all types of cells. 

♦ 

Actin filaments (F-actin) are built up of monomers (G- 
Actin). In muscle cellsi actin-i myosinn for both of which several 
25 paralogous genes are knouni as well as many more proteins are 
constituents of the contractile apparatus- 

The "thin 11 and "thick filaments 11 in a muscle cell consist 
mainly of actin and myosini respectively. 

Several different proteins are responsible for the anchoring 
30 of the actin filaments in the Z-disks (e.g. alpha-actinin and 
desmin) or at the end of the myofibers in the cell membrane- 

Troponin In -Ci -T and Tropomyosin - associated with actin - 
confer the Ca++- dependent triggering of contraction. 
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Length of the sarcomere is controlled by the giant protein 
titin. 



In smooth musclei there is no troponin. Contraction activity 
is controlled by phosphorylation / dephosphorylation of myosin by 
5 a specialized kinase instead- Contractile fibers are not 
organized in sarcomeres. 

Apart from contributing to muscle contraction! the 
actomyosin system is responsible for many other motions at 
cellular leveli e.g. the amoeboid movement of pseudopodia or the 
10 fission of cells at the end of mitosis by a contractile ring. 

Besides thisn actin fibers fulfill structural tasks like 
maintenance of the shape of stereocilia or microvilli. Here-» 
actin filaments are connected by proteins like fimbrin- But not 
only specialized structures like the mentioned ones contain actin 

15 fibers. There is a network covering the complete cell volume with 
F-actin as a major constituent. Whereas the actin filaments in 
the structures mentioned above are relatively stable-i this F- 
actin is highly dynamic- Management of the network structure and 
turnover is achieved by connecting proteins like alpha-actinini 

20 fimbrin or fill-in^ turnover is regulated by gelsolini villin-i 
and different capping- and fragmentation-proteins. 

Microtubules are built up of alpha-beta tubulin 
heterodimers- Turnover of filaments is achieved by building-in 
and releasing of monomers with different time constant rates at 

25 both ends. The resulting cycle is called "treadmilling 11 . Thirteen 
strings of tubulin duplets build up one subfiber-i whereas one 
fiber contains two or three of those- A complete axoneme consists 
of T radial and 2 central fibers. This UC H2" - structure is the 
basis both of flagella-i their basal bodies and centrioles- In 

30 flagellan several additional structures like radial elements 

exist- Nexin connects the fibers and dyneine is the motor ATPase 
which shifts the fibers relative to each other. Several genetic 
diseases like the Cartageneric syndrome are caused by 
deficiencies of distinct proteins in cilia. 

35 Besides this-i microtubules are abundant in all types of 

cells- They are part of a delivery system for organellesn e.g. in 
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the golgi apparatus- A further very important system based on 
microtubules is the mitotic spindle-i it is organized by the 
centrosomes- Besides many other components-i the major part of a 
centrosome are two centrioles which are built up of nine 
microtubule-triplets • Most remarkably-i new centrioles are not 
synthesized de novo but generated by duplication of old ones* 

Cytoplasmic microtubules are associated with many different 
proteins. Two major classes are known: The MAPs ("microtubule- 
associated proteins 11 -! with molecular masses between EDO and 3DD 
kD) and the much smaller tau-Proteins with a Mb) between LQ and 70 
kD- These proteins regulate the treadmill-process and the 
interaction with other structures in the cell. 

Besides actin and myosin the so-called intermediate 
filaments constitute a third class of filaments. In contrast to 
the former two groupsi they do not participate in motilityn nor 
are they dynamic structures subject to a vivid turnover- The most 
important ones are neurofilaments (in neurons) t keratin filaments 
(mainly in epithelial cells)-! and vimentin filaments (in many 
sorts different cell types). 

The biological function of both the cytoskeleton as well as 
contractile apparatus of a cell does not end at the cell 
membrane. Cells must be embedded in the extracellular matrixn 
all cells of a muscle must act as one single mechanical unit and 
epithelia must resist macroscopic mechanical forces- Hencen cell 
adhesion and the extracellular matrix are closely connected to 
the cytoskeleton- Vincullin is one of the proteins which serve as 
an anchor for intracellular fibers (actin)- Different types of 
desmosomes and tight junctions connect neighbor cells with 
intercellular fibers- On the inside-i cytoplasmic plaques connect 
them to the cytoskeleton- These structures-i on the one handi 
serve as mechanical elements whereas gap junctions! on the other 
hand-i connect cells metabolically - 

The extracellular matrix consists of a network of proteins-i 
glycoproteins and polysaccharides- Different proteins are present 
in relation to different mechanical demands:. Elastin is found 
in tissues with high elasticity (lungs-i heart) whereas collageni 
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a more hard-wearing protein-, is found in tendons and ligaments- 

Fibronectin is an extracellular protein highly important for cell 
adhesion* 

Reference: Murray J et al (1^2): Cell Ilotil Cytoskeleton 
5 22: 211-223. 

Uithin the overall group of Cell Structure and Motility 
several categories of proteins are coded for by clones of the 
invention: 

Ankyrins : Ankyrins are peripheral membrane proteins which 

10 interconnect integral proteins with the spectrin-based membrane 
skeleton. Thus these proteins are involved in coupling of cyto 
skeleton and cell membrane. OMIN reports that Ankyrins have 
associations (as potentially diagnostic therapeutid causative-i 
and/or related-i etc..) with the following diseases: 1) Heriditary 

15 Spherocytosis (OMIN *lfl2 c lD0)n 2) Hemolytic Poikilocytic Anemia 
due to reduced ankyrin binding sites (OMIN mi?D0)i 3) Atypical 
Elliptocytosis (OMIN 225M50)n M) Autosomal recessive 
spherocystosis (OMIN #270=170) ^ 5) Werner Syndrome (OMIN *277700)i 
and b) Rhesus-unlinked type Elliptocytosis (OMIN #13Db00). 

20 Ankyrin bindung glycoprotein proteins mediate Ankyrin effects-* 
especially in neuronal adhesion and prostate tumour vcell 
transformation: Clones in this category include: amy2_121f IT • 

Tropomyosins are ubiquitous proteins of 35 to MS kD 
associated with the actin filaments of myofibrils and stress 

25 fibers. They are involved in cardiomyopathies (OMIN *ni030n 
*ni01Qi sno^Q-i *bD0317). Clones in this category include: 
tes3_lbb5- 

Differentiation/Development 

30 Almost every multicellular organism originates from meiotic 

cell divisions and the recombination of a paternal and a maternal 
set of chromosomes- After fertilization of the eggn all cells of 
a body originate from this one cell. Thus the cells of the 
developing body are initially genetically alike- But 

35 phenotypically they become very different- They are specialized 
to a certain cell type and arranged in an organized pattern to a 
certain type of tissue and the whole structure has the well- 
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defined shape of an organ- All these features are determined by 
the DNA sequence of the genomei which is reproduced in every 
cell- Each cell acts on the genetic instructions given to a 
certain time and at a certain place of development and plays its 
5 individual part in the multicellular organism- Cell 

differentiation may be divided into three general steps: cell 
cycle exitn apoptosis protection and tissue specific gene 
expression. These processes are coordinated to provide the final 
and unique tissue characteristics. 

10 An animal cell that has achieved a certain level of 

development is said to be determined- This differentiation of a 
cell may be irreversible and in that case the cell may be renewed 
only by simple duplication. Other cells are renewed by means of 
stem cells which are immortal ( e.g. stem cells of the bone 

15 marrow-, epidermal stem cells). The genetic control of development 
is extensively studied in non-vertebrates and vertebrates. The 
classical animal model is the fruit fly Drosophilia and the 
modern model is the transgenic mouse- Animal transgenesis has 
proven to be useful for physiological as well as 

20 physiopathological studies. Besides the approach based on the 
random integration of a DNA construct in the mouse genome-, gene 
targeting can be achieved using totipotent embryonic stem cells 
for targeted transgenesis. Transgenic mice are than derived from 
the embryonic stem cells- This allows the introduction of null 

25 mutations in the genome (so-called knock-out) or the control of 
the transgene expression by the endogeneous regulatory sequence 
of the gene of interest (so-called knock-in). Mice can be 
created that express wild-type genesi mutant genes-i marker genes 
or cell lethal genes in a tissue specific manner- These animal 

30 models allow to follow changes in tissue and organ development 
and lead to a better understanding of the cellular function of 
many genes or to the generation of animal models for human 
diseases. Fundamental problems in immunologyi onset and 
development of canceri regulation in fatty acid metabolisnn 

35 aspects of cardiovascular function! control of the central 

nervous system development! analysis of reproductive development 
and function are only some examples of research interests. 
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The final stage of cell differentiation is growth arrest- 
In animal tissues with rapid cell turnover terminally 
differentiated cells undergo programmed cell death- The cells 
have the ability to kill themselves by activating an intrinsic 
5 cell suicide program when they are no longer needed or have 
become seriously damaged- The execution of this program is 
termed apoptosis- Apoptosis is of importance for development and 
homeostasis of animals- The key components of this program have 
been conserved in evolution from worms (C- elegans) to insects 

10 (Drosophilia) to humans- The roles of apoptosis include the 

sculpting of structures during development! deletion of unneeded 
cells and tissues! regulation of growth and cell number! and the 
elimination of abnormal and potentially dangerous cells- In this 
way apoptosis provides "quality control mechanism" that limits 

15 the accumulation of harmful cellsi such as virus-infected cells 
and tumor cells- On the other hand inappropriate apoptosis is 
associated with a wide variety of diseases! including AIDSi 
neuro-degenerative disorders and ischemic stroke. Because it is 
now clear that apoptosis is a result of an active! gene-directed 

20 process-! it should be eventually possible to manipulate this form 
of cell death by developing drugs that interact with its recently 
identified mechanisms of action. Inducers of cell 
differentiation! cell cycle arrest and apoptosis might be the 
novel molecular targets for new anticancer agents in addition to 

25 the signaling pathways for growth factors and cytokines- 

Proteins-, factors-, receptors and genes of importance in 
apoptosis: 

Proteases : 

- Calpain-i an intracellular cysteine protease! exact role 
30 unknown. 

- Caspase-1 to Caspase-ll! a family of proteases synthesized 
as an inactive proenzyme- Targets of the activated enzymes 
include: poly ( ADP-ribose) polymerase! DNA-dependent protein 
kinase! Ill ribonucleoprotein! nuclear laminins and cytoskeleton 

35 components (actin). 
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- Granzyme Bt a serine protease released by cytotoxic T- 
cells- 



Receptors: 

- CD (synonyms: Fas-. APO-D-i a receptor protein of the 
5 TNF-receptor family which includes TNF-R1 and TNF-R2 with the 

common characteristic of a 70 amino acid cytoplasmic domain- 

- FADD (synonym: i10RT-l)i a cytoplasmic protein 

- DR-3 (synonym: APO-3) a member of the TNF-receptor-f amily 

- DR-l* and DR-5 
10 Genes: 

- ced-3-i ced-M and ced-T encode the general apoptotic and 
antiapoptotic program in Caenorhabditis elegans. Apaf-3 is the 
mammalian homologue of ced-3. 

- Bcl-E / Bcl-xL / Bax / Bcl-xS / Bak: a large gene family 
15 that can either inhibit or promote apoptosis. 

- Cytokine response modifier Ai a cowpox virus gene whose 
gene product inhibits caspases- 

Others : 

- Caspase-activated DNase (CAD) and its inhibitor (ICAD)r 
20 causes DNA fragmentation in the nucleus 

- Ceramiden a complex lipid that acts as a second messenger. 

- c-Jun N-terminal kinase (JNK) is a proline-directed kinase 

- pS3 proteini is essential for the induction of apoptosis 
as a response to chromosomal damage- 

25 - RAIDDt a death signal-transducing protein- 

- Receptor interacting protein (RIP) is an accessory protein 
with a death domain and a serine/threonine kinase activity. 
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- Sphingomyelinasei an enzyme that hyclrolyzes the complex 
lipid sphingomyelin to ceramide- 



- Tumor necrosis factor (TNF) is a type -II membrane protein 

- TNF-receptor associated factor (TRAF2) -i is an accessory 
protein that can bind to both TNF-R1 and TNF-R2- 

Uithin the overall group of Dif f erentiation/Development -i 
several categories of proteins are coded for by clones of the 
invention : 

Notch family proteins: Notch family molecules are negative 
regulators of neuronal differentiation in early brain 
development- Clones in this category include: amyH_liE l * . 

Testis-specif ic Y-encoded proteins : The TSPY genes are 
arranged in clusters on the Y chromosome of many mammalian 
species. TSPY is believed to function in early spermatogenesis 
and is a candidate for GBY-i the putative gonadoblastoma-inducing 
gene on the Y. These proteins are involved in early 
spermatogenesis. Clones in this category include: amy2_7jS- 

Inflammation-me diating proteins: Inflammation is a basic 
mechanism responsible for recruiting and activation of immun- 
competent cells. By various mediators-! cells are activated and 
triggered to differentiate- Hyperactivation of these pathways 
leads to various disease states: In neuronal tissuesi in 
inflammatory diseases such as experimental autoimmune 
encephalomyelitis (EAE)-i neuritis(EAN) and uveitis (EAU ) 
allograft inflammatory factor-1 is produced by macrophages and 
microglia cells- Clones in this category include: amy2_Ebl c ]- 

Intracellular transport and trafficking 

Eukaryotic cells rely for their viability on the 
partitioning of many basic cellular processes into membrane- 
bounded organelles. These are the nucleus-i endoplasmic reticulum 
(ER)i Golgi apparatusn endosomesi lysosomal compartments! 
mitochondria and peroxisomes. Host molecules destined for the 
lysosomei cell surface and outside the cell are routed through 
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the ER and Golgi-i which together with the vesicular intermediates 
between themi comprise the secretory pathway (Palade 1175). In 
the ER and Golgi compartments proteins are sortedn modified and 
often assembled into complexes en route to their final 
5 destination- Incorrectly assembled proteins are retained in the 
ER until they fold correctly or are targeted for degradation- 
Additional proteins are translocated into and function within the 
lumenal spaces of organelles or are secreted- Thus a large 
proportion of proteins synthesized require targeting to membranes 

10 either for insertion into or transport across them- A major 

purpose of this is growth. The secretory pathway is dependent on 
an intact cytoskeleton and also closely linked to general 
metabolism by affecting ribosome biogenesis (Mizuta and lilarneri 
n^M). A huge number of proteins is required for targeting! 

15 translocation and sorting of newly synthesized proteins- 

The first step in sorting is the recognition of cis-acting 
targeting or signal sequences that organelle-targeted proteins 
contain- This is carried out by cytosolic targeting factors 
and/or receptors on the membrane to which the protein is 

20 targeted. In some cases the primary sequences are extremely 
degenerate-i with only the overall character being conserved 
(hydrophobicity for an ER signal sequence-! helical amphiphilici ty 
for mitochondrial targeting sequence (Kaiser et al-i nfl7i Lemire 
et al-T nflT)- Following the targeting stepn proteins are either 

25 inserted into or transported across the membrane (translocated) 
through a proteinaceous apparatus (termed the translocon). The 
translocon include or recruit motors to drive the translocation 
process in the correct direction (Schatz and Dobberstein-i nib). 
Defined intracellular protein transport steps: 

30 ■ ER 

- targeting to the ER 

- translocation into the lumen of the ERt andi 
depending on the presence of certain signals in the peptide 
sequence transport through the golgi complex 

35 ■ Mitochondria 

- targeting 

- translocation 
■ Peroxisomes 
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■ The general secretory pathway 

- protein modif ication-i assembly and quality control in 

the ER 

- vesicle-mediated trafficking 
5 - vesicle docking and fusion 

- transport through the golgi apparatus and sorting at 
the trans-golgi 

- transport to the cell surface 

- transport routes to the lysosome 
10 ■ Endocytosis 

■ Specialized protein transport routes 
• Protein export from the cytoplasm 

References: Palade-, G (n?5) Science 1ST : llizuta et 
al- flol Cell Biol 1M: 2413-2502^ Kaiser et al- <nfl?> 

15 Science 235: 312-317^ Lemire et al- (ITflT) J Biol Chem 2b4: 
2D20t,-2021S* n Schatz et al. (ITJfc) Science 271: 15M-152L. 

Rab proteins 

In eukaryotic cells the compartmentalisation of processes is 
a prerequisite for a tight regulation of processes and 

20 activities. The cells contain a highly dynamic set of membrane 
compartments that are responsible for packaging! sortingi 
secreting-i and recycling proteins and other molecules- 
Trafficking between organelles within the secretory pathway 
occurs as vesicles derived from a donor compartment fuse with 

25 specific acceptor membranesi resulting in the directional 
transfer of cargo molecules- This process is tightly controlled 
by the Rab/Ypt family of proteins (reviewed by Novick and Zeriali 
m?)i a branch of the superfamily of small GTPases. Rab 
proteins regulate a variety of functions-i including vesicle 

30 translocation and docking at specific fusion sites- Rabs may also 
play critical roles in higher order processes such as modulating 
the levels of neurotransmitter release in neuronsi a likely 
mechanism in synaptic plasticity that underlies learning and 
memory (Geppert and SUdhofn mfl). 

35 Small GTPases share a common three-dimensional fold thati in 

the GTP bound statei can bind a variety of downstream effector 
proteins- GTP hydrolysis leads to a conformational change in the 
"switch" regions that renders the GTPase unrecognizable to its 
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effectors* In this way-i by localizing and activating a select set 
of effectors-! a common structural motif is used to control a wide 
array of distinct cellular processes. 

The final steps in membrane fusion are likely to be driven 
5 by a set of proteins known as SNAREs. After a vesicle becomes 
dockedi the cytoplasmic domains of VAMP (also termed 
synaptobrevin) and syntaxin on opposing membranes-i in combination 
with a SNAP-2S moleculen coalesce into an elongated -helical 
bundle (Poirier et al-i ITOi Sutton et al-i mfi)i which may 

10 lead to fusion. Because numerous SNARE isoforms have been 
identified that localize to distinct membrane compartments-*, it 
was originally proposed that the specificity of interaction 
between the SNARE proteins accounted for the specificity in 
membrane trafficking. Recent results-! however suggest that 

15 SNAREs are not specific in their ability to form complexes in 
vitroi suggesting that trafficking specificity requires 
additional factors (Yang et al.-i mi). In this regard-! Rab 
proteins are strong candidates for governing the specificity of 
vesicle trafficking. Like the SNAREs-. many isoforms (MO) of the 

20 Rab family have been identified that localize to specific 
membrane compartments (reviewed by Novick and Zeriali m?). 

Concomitant with the SNARE cycle-! Rab proteins undergo a 
intricate cycle of membrane and protein interactions- Rabs are 
posttranslationally modified at C-terminal cysteines by the 

25 addition of two geranylgeranyl groups-i which mediate membrane 
association when the Rab is in the GTP-bound state- After guanine 
nucleotide hydrolysis occurs-i the Rab is extracted from the 
membrane upon forming a complex with a cytosolic GDP-dissociation 
inhibitor (GDI) - This cytosolic intermediate is then recycled 

30 onto a newly forming vesiclei most likely through a secondary 
factor termed a GDI dissociation factor (GDF)t which displaces 
GDI . After the Rab becomes membrane bound-i a guanidine nucleotide 
exchange factor (GEF) promotes release of GDP and the subsequent 
loading of GTP- In its GTP-bound conformation-! the Rab is then 

35 free to associate with its specific set of effectors-i which can 
in turn trigger events leading to the eventual fusion of the 
vesicle with a target membrane- To complete the cycle-! perhaps 
after or concurrent with membrane fusion-i a GTPase activating 
protein (GAP) accelerates nucleotide hydrolysis-i switching off 
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the GTPase- The remaining GDP-bound Rab can then participate in a 
new round of fusion- 

Rab interactions with effectors are likely to regulate 
vesicle targeting and membrane fusion in three ways- Firsts a Rab 
5 may specifically facilitate vectorial vesicle transport- Vesicles 
are transported from their site of origin to acceptor 
compartments likely through associations with cytoskeletal 
elements and transport motors- A protein has been identified with 
a domain structure that suggests a connection between the 

10 cytoskeleton and the Rabs- This protein-, called Rabkinesin-bi 
contains a kinesin-like ATPase motor domain followed by a coiled- 
coil stalk region and a RBD that specifically binds Rabh (Echard 
et al-n mfl ). An additional link with the cytoskeleton is 
provided by the Rab effector-, Rabphilin-3A • Rabphilin-3A has been 

15 shown in vitro to interact with -actinin-t an actin-bundling 
protein-, but only when not bound to Rab3A (Kato et al-i )- 
These results raise the intriguing possibility that Rab proteins 
regulate vesicle interactions with the cytoskeleton and thereby 
play an active role in targeting vesicles to their appropriate 

20 destinations. 

Seconds Rab proteins may regulate membrane trafficking at 
the vesicle docking step. A number of Rab effectorsi including 
Rabaptin-5-i EEA1-, Rabphilin-3A n and Rim-, may serve as molecular 
tethers- Each effector protein contains a RBD-, followed by a 

25 linker region (some having the potential to form elongated 
coiled-coil structures) and a domain capable of interacting with 
a second Rab or the target membrane. Rabaptin-S-» for example-* 
contains two RBDsn one near the N terminus that specifically 
recognizes RabM and a second near the C terminus that binds RabS 

30 (Vitale et al-i mfl )- Both Rim-, which is localized to the 
target membrane-, and Rabphilin-3A-, which is localized to the 
vesicle-, contain N-terminal RBDs and Oterminal CaE+-binding CB 
domains! implicating these effectors in synaptic vesicle 
localization or docking in response to CaE+ influx (Wang et al-i 

35 ni? )- Tethering effectors may also recognize protein complexes 
on the acceptor membrane. Sec t 4pi a yeast Rab3A homolog-, interacts 
with the exocyst (Guo et al-i ITTT )i a complex of seven or more 
subunits that is assembled at sites of vesicle fusion along the 
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plasma membrane- The exocyst complex may therefore function as a 
landmark for Rab/ef f ector-mediated vesicle docking. 

Third-i once a vesicle has become tethered to its fusion 
sitei Rab proteins may selectively activate the SNARE fusion 
5 machinery. The mechanism of this activation is unknown but may 
involve direct interactions of Rabs ori more likely-i their 
effectors with SNAREs- For examplen Hrs-2 is a protein that binds 
to SNAP-25 and contains a Zn2+-finger motif characteristic of 
Rab-binding proteins such as Rabphilin-3A-i RinrH EEAl-» and NocE-i 

10 suggesting that Hrs-E may form a physical link between Rabs and 
SNAREs (Bean et alo 1W) . In addition! certain mutations in the 
syntaxin-binding protein Slylp-t the Seclp homolog utilized in ER 
to Golgi traf f icking-i eliminate the requirement for Yptlpi a Rab 
protein that functions at this trafficking step (Dascher et al.-i 

15 11T1 )• Rabs may therefore regulate SNARE associations through 
Seel family members- In support of this idean a Rab effector was 
recently found to interact with a vacuole Rabn a Seclp homologi 
and a SNARE protein (Peterson et al-T 1111 ) i which suggests that 
this effector serves to connect Rab and SNARE function- In this 

20 wayi Rabs and their effectors may facilitate the correct pairing 
of SNAREs. 

References: Dascher et al- (mi) Hoi- Cell- Biol. Hi 872- 
335; Echard et al- (Ilia). Science- 271-, 530-SaS; Geppert et al- 
(1118) Annu- Rev- Neurosci- Eli 7S-1S^ Guo et al- (JW). ENBO J. 

25 la-. 1071-lDflD; Kato et al- (11%) J- Biol- Chem- 271-. 31775- 
31773; Novick et al- (1117) Curr- Opin- Cell Biol*. 1-, mb-5DM n 
Peterson (1111) Curr. Biol- In 151-lb2n Poirier et al- (ma) 
Nat- Struct- Biol- S-, 7b5-7b1=. Vitale et al- (ma) EI1B0 J- 17-. 
IIMI-HSH Wang et al. (11=17) Nature- 3&&, 513-5iai Yang et al. 

30 (1111) J- Biol. Chem- 27LJ-. 5bm-5L53. 

Within the overall group of Intracellular Transport and 
Trafficking several categories of proteins are coded for by 
clones of the invention- 

Vesicular trafficinq: Various proteins are involved in 

35 trafficing of vesicles inside the cell and for the exocytotic 
pathway. For examplei Sec? of Saccharomyces cerevisiae takes 
function in vesicular traficking. Synaptotagmins are essential 
for Ca (2+)-regulated exocytosis of neurosecretory vesicles- Other 
proteins such as Dynamin are microtubule-associated force- 

-502- 



WO 01/98454 PCT/IB01/02050 
producing proteins-! which are involved in the production of 
microtubule bundles: By binding and subsequent hydrolysation of 
GTP such proteins provide the motor for vesicular transport 
during endocytosis. Clones in this category include: amy2_14b5n 
5 amy_2ol3 and fkd2_3kl. 

Protein sorting: Protein sorting is a process essential for 
the maintenance of a cells functionality and structural 
integrity- Most proteins perform their biological function in 
special compartments in the cell-The process of sorting is 
10 complex and highlky regulated. Clones in this category include: 
mel2_?glM. 

Metabolism 

This group includes proteins which are involved in the 

15 uptake and consumption of nutrients-i and enzymes which are part 
of the biochemical pathways for energy metabolism or which are 
involved in the supply of building blocks of nucleic acids-i 
proteins (NTPs-. dNTPs-i amino acids) for jDNA/RNA and protein 
synthesis! and fatty acids (membranes) -i to allow for the 

20 generation of higher order structures. This group constitutes the 
most important and largest group in prokaryotes and lower 
eukaryotes. The higher the evolutionary level of an organism is-i 
howevern the more other protein classes like L signal 
transduction" 1 1 L cell cycle* 1 and L dif f erentiation and development 1 

25 increase in importance and number of representatives. 

Proteins involved in the metabolism of energy and compounds 
(here: other than nucleic acids or proteins) are usually the 
products of house keeping genesi they are often constituti vely 
and/or ubiquitously expressed- 

30 Several categories of proteins are coded for by clones of 

the invention within the overall group of Metabolism: 

Fattv acid metabolism: OMIN lists more than SO diseases 
caused by pathologic altered fatty acid metabolism. 1-acyl- 
35 glycerol-3-phosphate acyltransf erase is involved in fatty acid 

metabolism and is ubiqitous expressed-! with a slight predominance 
in uterusi placenta and foreskin. Clones in this category 
include : amy2_2c22 
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Repair and sur veillance of protein damage: Several classes 
of protein are involved in reapair and surveilance of protein 
damage- L-isoaspartyl methyltransf erase (Pimt)-, as an example-i is 
a highly conserved enzyme utilising S-adenosylmethionine (AdoMet) 
5 to methylate aspartate residues of proteins damaged by age- 
related isomerisation and deamidation- Clones in this categroy 
include: f br2_7fli21 - 

Nucleic acid management 

The genetic information is stored in the form of nucleic 
10 acids in all organisms. Two kinds of nucleic acids exist-, DNA and 
RNA- Uhereas the more stable DNA in most organisms constitutes 
the storage form of the genetic information-, the labile RNA and 
in particular mRNA is an intermediate used for the temporal 
expression of specific genes- 
15 In eukaryotes-i DNA is usually a double stranded linear 

molecule consisting of two antiparallel strands and made up of a 
deoxyribose-i a phosphorus backbone and the four bases Ai C-i G-i 
and T- The DNA of some organisms has a ring structure. The 
structure of DNA was unraveled years ago by Watson and Crick- DNA 
20 is directional molecule determined by the C-atoms of the sugar. 

The most important processes dealing with nucleic acids are: 

• replication (e-g. DNA polymerases-, Telomerase) 

• transcription (RNA polymerases) 

• RNA processing (maturation - splicing and degradation) 

25 • in additionn enzymes and proteins exist which require a nucleic 
acid (mostly RNA) in the active center to be functional 
(ribozymes - e-g. RNase-, Ribosomal proteins) 

The DNA of a cell is replicated in the S-phase of the cell 
cycle- Several enzymes carry out the task of doubling this 
30 nucleic acid- As all steps of the cell cycle-* also the process of 
replication is tightly regulated- The enzyme DNA polymerase and 
several other proteins are involved in this process. Uhereas many 
prokaryotes do have only one origin of replication (i-e--i the 
starting point of the replication cycle)-, in eukaryotic DNAs 
35 (chromosomes) multiple such start points exist- The switch from 
the synthesis (S) phase to the subsequent S5 or M phases of the 
cell cycle are dependent on the completion of the replication- 
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. This makes clears that a number of proteins are involved in the 
replication itself as well as in the control of the process- 
Since most eukaryotic chromosomes are linear structures! 
additional proteins and enzymes are necessary to make sure that 
5 the structure is maintained through successive generations. This 
includes those proteins necessary to build the three dimensional 
structure of chromosomes (e.g. histones) and the structural 
network of the nucleus and nucleolus (including the defined 
localization of transcriptionally active genes in the vicinity of 

10 nucleoli) but also such enzymes as telomerase which guarantees 
the integrity of the chromosomal ends. 

The expression of genes is usually performed in two steps. 
First a messenger RNA (mRNA) is produced (transcribed) in one to 
many copies and second this mRNA is translated into the protein 

15 product. The regulation of transcription is discussed under the 
separate heading L transcription factors 1 ! but also the classes 
"•signal transduction 1 -! '■development 1 ! L cell cycle 1 and others are 
affected as the expression of certain genes determines the fate 
of a cell or organism. 

20 The primary transcript (hnRNA - heterogeneous nuclear RNA) 

is a single stranded one-to-one copy of the gene as it is located 
on the chromosome. Before a protein can be translated-! already 
during transcription the process of maturation is initiated. 
Firstly-i a S 1 cap structure is enzymatical ly and covalently added 

25 to the RNAi blocking the S 1 end of the RNA . Second-, when the RNA 
polymerase has terminated polymerization-i the enzyme poly A 
polymerase adds varying numbers of adenine residues to the 3 1 end 
of the transcript. This enzyme recognizes the sequence AAUAAA or 
AUUAAA (+ some minor variations)-! cuts the RNA ID - 30 

30 nucleotides downstream and adds the A residues- The size of the 
poly A sequence affects the stability of the RNA. Finally-! in the 
process of splicingn the introns present on the genomic level and 
also present in the hnRNA are spliced out by a multi-protein 
complex consisting of several proteins and RNAs - The finally 

35 maturated mRNA is exported to the cytoplasm where it is 
translated with help of the ribozymes- 

The half life of RNA is usually much shorter than that of 
DNA- Usually-! the mRNA is degraded shortly after synthesis! to 
guarantee a very defined window of expression of a given gene. 
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This regulation is necessary to specifically maintain or change 
the set of proteins present at any time in a cell- Specific 
regions in the B^UTR (untranslated region) determine the 
stability of the mRNA in the cytoplasm before it is degraded by 
RNases-i enzymes consisting both of protein and RNA- 

References: Uatson and Crick (1T53) Nature 171'- 737-73S- 
Several categories of proteins are coded for by clones of 
the invention within the overall group of "Nucleic acid 
management"and include! among othersi the following: 



Proteins induced by DNA-Damage : There are several distinct 
pathways responsible for repair of DNA • Nucleotide excision 
repair is the most versatile DNA repair pathway and isthe main 
defense of mammalian cells against UV-induced DNA damage- Defects 

15 in proteins involved in this pathway can lead to inherited 

disorders (such as xeroderma pigmentosum OMIN *27fi700i *27fl72Di 
*S7fl7HD and m^DDi Cockayne's syndrome OHIN *21b4DQ and 
trichothiodystrophy OMIN #b01b75). Study of UV-sensitive yeast 
RAD mutants has greatly aided this process and has revealed 

20 strong conservation of the components of nucleotide excision 

repair in eukaryotes- Clones in this category include: amy2_lln l 4 
and tes3_10ilb- 

Proteins involved in Loadi ng of transf erRNAs : transfer RNAs 
must be coupled to an aminoacidi which then is transported to the 
25 peptideyl-transf erase centre of the ribosome. Clones in this 
category include: fbr2_7flcl2- 

Cvtosolic ribosomal proteins : Several proteins are part of 
the eukaryotic ribosomal peptidyl transferase center or modulate 
the activity of this centre- Such proteins can find application 
30 in modulation of ribosome assembly! maintenance and activity. 
Clones in this category include: amy21il 

Histones: Histones are DNA-binding protein responsible not 
only for DNA structure and folding and packingi but also are 
discussed to be involved in activation and silencing of large 
35 chromosomal regions- Clones in this category include: tes3_31al0- 

mRNA-bindinq proteins : mRNA-binding are involved in 
regulation of mRNA folding-i translation and stability. For 
example-i the VILIP protein binds specifically to the 
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3*untranslated region of the neurotropin receptor mRNA. Clones in 
this group include amyE_Hgl2- 

Signal transduction 

Cells in higher order organisms need to continuously 
5 communicate with its environment especially with other cells of 
the same organism in order to maintain the function and 
specialization of the whole system these cells are part of. This 
important task of communication is performed with help of cell- 
surface receptors which receive and transmit signals from outside 
10 into the cell- 
G-protqjLns 

The largest known family of cell-surface receptors is that of the 
G-protein-coupled receptors-, which mediate the transmission of 
diverse stimuli such as neurotransmitters-, glycopeptides-. 

15 hormonest peptides-, odorant molecules-, and photons- The 

functional unit of these receptors is composed of the receptor 
molecule itself (GPCR) which is anchored in the cytoplasma 
membrane with seven membrane spanning domains-* the heterotrimer ic 
G-protein which is composed of and -subunits (G and G ) -. 

20 and the effectors that interact with G and / or G - In 
particular the dissociated G and G can regulate the 
activities of a number of effector molecules such as adenylate 
cyclases! phopholipase C isoforms-. ion channels-! and tyrosine 
kinases-i resulting in a variety of cellular functions- The 

25 process of signal transduction must be tightly regulated and 

reversible in order to avoid overstimulation-! to achieve signal 
termination-, and render the receptor responsive to subsequent 
stimuli Clacovelly L- et al.-i (MIT) FASEB J. 13-. 1-fl-. Hamm-. H-E- 
<mA) J. Biol. Chem. 273-1 bb^-bTEJ- 

30 G-proteins are GTPases that-* upon binding of GTP change 

their conformation which in return unmasks structural motives-, 
in particular the so called effector loop-, which can mediate the 
interactions to target proteins-, or effectors! for the GTPases- 
This ability enables the GTPases to cycle between active-. GTP- 

35 bound and inactive-i GDP bound conformations and in the process to 
function as molecular traffic lights in a multitude of signal 
transduction pathways- The most important of these signal 
transduction pathways that are regulated with help of G-proteins 
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are that of the phospholipase C / protein kinase C and that of 
the adenylate cyclase / protein kinase A- 

The cycling of GTPases is tightly regulated by three main 
classes of proteins: The exchange of hydrolyzed GDP for a fresh 
5 GTP is facilitated by guanosine nucleotide exchange factors 
(GEFs)-i the hydrolysis of GTP to GDP is sped up by GTPase- 
activating proteins (GAPs)i and the dissociation of GDP from the 
GTPases is inhibited by GDP dissociation inhibitors (GDIs) CTapon 
and Hall (lW) Curr.Opin. Cell. Biol. 9-. fib-^ Van Aelst and D- 
10 Souza-Schorey (IW) Genes Dev. 11-, SSTS-E3EH3 • 

SOC-f amily 

A conserved motif that was originally identified in proteins 
that negatively regulate the signaling action of cytokines was 
termed SOCS box-i the Suppressor Of Cytokine Signaling- Based on 

15 homology! five distinct structural protein classes have been 

identified since that carry this motif. The function of most of 
these proteins is presently not known- Common to the proteins is 
only the SOCS box which is located near the C-terminus of the 
respective peptides. Recently! the SOCS box has been demonstrated 

20 to induce binding of proteins to elongins B and C which could 
target the proteins (and bound substrates) to the proteasomal 
protein degradation pathway (Kamurai T. et al. (mfl) Genes Dev. 
12i 3A72-3AAH Zhang! J.-G. et al. (1^1) Proc. Natl. Acad. Sci . 
USA 96-» E071-E07b). 

25 The class where the SOCS box was originally described 

contains several members (S0CS-1-S0CS-7 and CIS). In addition to 
the SOCS boxi these proteins also contain a SHE (Src-homology E) 
domain and a variable N-terminus. These SOCS proteins appear to 
form part of a classical negative feedback loop that regulates 

30 cytokine signal transduction. Upon cytokine stimulation! 

expression of SOCS proteins is rapidly induced and the proteins 
inhibit further cytokine action. The mode of action of the -SOCS 
proteins is variable. While S0CS-1 binds and inhibits the JAK 
(Janus kinases) family of cytoplasmic protein kinases ENarahzaki 

35 n. et al- (l^A) Proc. Natl. Acad. Sci. USA 95-, 13130-1313^ 

Nicholsoni S-E- et al- (HID EMBO. J. 18! 37S-3flSli CIS appears 
to act by competing with signaling molecules such as the STATs 
(Transducers and Activators of Transcription) family for binding 
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to phosphorylated receptor cytoplasmic domains lYoshimura, A- et 
al. (m5) EMBO J. 14, Sfllt.-2flBbn llatsumoto, A- et al- (ITT?) 
Blood 89, 31Mfi-31Sm- 

A second class of SOCS box protein contains additionally WD- 
5 HQ repeats which were initially identified in the mouse WSB-1 and 
-2 proteins. The functions of UD-MQ proteins are not completely 
understood but seem to be rather divergent. In CdcMp the UD-MD 
repeats probably are necessary for binding the substrate for 
Cdc3Mp (Lllathias, N. et al. (IW) Mol. Cell Biol. 19, 175^-17b73- 

10 CdcHp is a component of a ubiquitin ligase that tethers the 
ubiquitin-con jugating enzyme Cdc3LJp to its substrates. The 
posttranslational modification of a protein by ubiquitin usually 
results in rapid degradation of the ubiquitinated protein by the 
proteasome. The transfer of ubiquitin to substrate is a multistep 

15 process where UD-MD repeats might play an important function. 

Other lilD-HO containing proteins (e.g. the retino blastoma 
binding protein RbApMfl) have been shown to bind metal ions (Zinc) 
and that this metal binding might mediate and/or regulate 
protein-protein interactions which are functionally important in 

20 chromatin metabolism EKenzior, A.L- and Folk, td-R. (mfl) FEBS 
Lett. 440, M55-MBT3. These proteins are involved in the RAS-cAMP 
pathway that regulates cellular growth EAch R - A . et al- (m?) 
Plant Cell 9, . 

The SPRY domain has been identified in pyrin or marenostrin, 

25 a protein which is mutated in patients with Mediterranean fever 
and which is similar to the butyrophilin family. Uhile 
butyrophilins seem to be involved in the lactation process in 
mammals, the function pyrin is unknown- Three proteins (SSB-1 to 
-3) have been identified to contain both SPRY and SOCS box 

30 motifs- The function of these proteins is also not known- 
Ankyrin repeat containing proteins share a 33-residue 
repeating motif, an L-shaped structure with protruding -hairpin 
tips which mediate specific macromolecular interactions with 
cytoskeletal, membrane, and regulatory proteins- These proteins 

35 play fundamental roles in diverse biological activities including 
growth and development, intracellular protein trafficking, the 
establishment and maintenance of cellular polarity, cell adhesion 
signal transduction, and mRNA transcription- Three proteins that 
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contain ankyrin repeats (ASB-1 to -3) have been identified to 
contain a C-terminal SOCS box additionally to the ankyrin 
repeats- The function of these proteins or the individual domains 
remains to be discovered EHilton-i D-J- et al- (ITTfi) Proc. Natl. 
5 Acad. Sci. USA 95-. im-lMJ. 

A few small GTPases (RAR and RAR like) do also contain a 
SOCS box- GTPases are involved in signal transduction during 
cellular communication. The function of the SOCS box in this type 
of proteins is currently unclear EHilton-. D.J. et al- (mfl) 
10 Proc. Natl. Acad. Sci. USA 95 -. llM-inU- 

Ca g + as second messenger 

The bivalent cation Ca 2+ is-i besides cAMPi one of the two 

major second messengers in eukaryotic cells- Its intracellular 

concentration is tightly regulated and usually kept very low 

15 compared to the cellos environment. Ca 2+ binding proteins and 
transporters (Gap junction! Voltage-gated-, second messenger- 
gated) help to sequester huge amounts of the ion in various 
organelles from where Ca 2+ can be released upon extracellular 
stimuli- E.g- the contraction of the muscle is dependent on the 

20 presence of Ca 2+ ions which are readily transported back into the 
organelles in order for the muscle to relax. In signal 
transduction! Ca 2 * functions as a second messenger that activates 
Ca 2+ dependent processes through the activation of Ca 2+ /calmodulin 
dependent protein kinases (Call kinases) which are the major 

25 effector molecules of Ca 2+ - In the signaling cascades-, the Cafl 
dependent kinases activate phospholipases (e.g. phospholipase C) 
that in return activate other protein kinases such as protein 
kinase C. 

cAHP 

30 The cyclic AMP is produced by the enzyme adenylate cyclase 

in response to extracellular signals. Certain G-proteins 
stimulate the activity of adenylate cyclase which converts ATP to 
cAMP and PPi- Two molecules of cAMP bind to each of two 
regulatory subunits of cAMP dependent protein kinase which in 

35 turn dissociate from the two catalytic subunits of the 
heterotetramer RhC2- Upon release of the C-subunits-i they become 
active and phosphorylate substrate proteins at Ser and Thr 
residues. The process leading from binding of extracellular 
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molecules to their receptors-, the transmission of the stimuli 
into the cell-, the activation of adenylate cyclase and the 
subsequent activation of cAMP dependent protein kinase is one of 
two major signal transduction pathways in eukaryotic cells. Since 
5 the phosphorylation of proteins is a posttranslational 
modification of proteins-, the kinases are described in the class 
u signal transduction-" 

SARA 

Members of the transforming growth factor B (TGFfS) 

10 superfamily signal through a family of cell-surface transmembrane 
serine/threonine kinases-, known as type I and type II receptors 
(Heldin et al.-, m? =1 Attisano and tilrana-, l^A ; Kretzschmar and 
Massagu^T Ligand induces formation of heteromeric 

complexes of these receptorsi and signaling is initiated when 

15 receptor I is phosphorylated and activated by the constituti vely 
active kinase of receptor II (Urana-.et al-i n^M )- The activated 
type I receptor kinase then propagates the signal to a family of 
intracellular signaling mediators known as Smads (contraction of 
the C-elegans Sma and Drosophila Had genes which were the first 

20 identified members of this class of signaling effectors) - 

Three classes of Smads with distinct functions have been 
defined: the receptor-regulated Smads-, which include Smadl-, 5i 3-, 
5-, and fli the common mediator Smad-, SmadM; and the antagonistic 
Smads-, which include Smadb and 7 (Heldin et al--, Attisano 

25 and Uranan ma } Kretzschmar and Massague-, l^fl ). Receptor- 
regulated Smads (R-Smads) act as direct substrates of specific 
type I receptors-, and the proteins are phosphorylated on the last 
two serines at the carboxyl terminus within a highly conserved 
SSXS motif (Macfas-Silva et al-i ITTb =i Abdollah et al-i t 

30 Kretzschmar et al-i m? i Liu et al-i m?b ; Souchelnytskyi et 
al-n ITT? )- Regulation of R-Smads by the receptor kinase 
provides an important level of specificity in this system- Thus-, 
Smad2 and Smad3 are substrates of TGFK or activin receptors and 
mediate signaling by these ligands (Macias-Silva et al--, i 

35 Liu et al-i 1^7b n Nakao et al.-, 11=17 ) -, whereas Smadl-, Si and fl 
are targets of BtIP receptors and propagate BMP signals (Hoodless 
et al.-, i Chen et al.-, 1^7 b =i Kretzschmar et al--, m? \ 

Nishimura et al--, mfl )- Once phosphorylated-, R-Smads associate 
with the common Smad-, SmadM (Lagna et al--, lITb % Zhang et al--, 
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) and mediate nuclear translocation of the heteromeric 
complex- In the nucleus-i Smad complexes then activate specific 
genes through cooperative interactions with DNA and other DNA- 
binding proteins such as FASTli FASTS-i and Fos/dun (Chen et al-i 
5 mb -i Chen et al-i m?a Liu et al-i m?a i Labb<§ et al--. 
mfl "t Zhang et al.-. mfi =i Zhou et al-i ma )- In contrast to 
R-Smads and SmadM-. the antagonistic Smads-. Smadb and 7-. appear to 
function by blocking ligand-dependent signaling (reviewed in 
Heldin et al-i m? ). 
10 Phosphorylation of R-Smads by the type I receptor is 

essential for activating the TGF8 signaling pathway (Heldin et 
al-i i Attisano and Urana-. ITTfl n Kretzschmar and Massague-i 

mfl )• However-, little is known of how Smad interaction- with 
receptors is controlled- A novel Smad2/Smad3 interacting protein 
15 has been described (Tsukazaki T. et al.i mfl ) that contains a 
double zinc fingeri or FYVE domain-* and which has been called 
SARA (Smad anchor for receptor activation)- The SARA motif 
recruits Smad2 into distinct subcellular domains and co-localizes 
and interacts with TGFS receptors- TGFB signaling induces 
20 dissociation of Smad2 from SARA with concomitant formation of 
Smad2/Smad l 4 complexes and nuclear translocation- Moreover-, 
deletion of the FYVE domain in SARA causes mislocalizat ion of 
Smadfi and inhibits TGFB-dependent transcriptional responses. 
Thus-i SARA defines a component of TGFS signaling that functions 
25 to recruit Smad2 to the receptor by controlling the subcellular 
localization of Smad- 

References: Abdollah et al. <MT?> J- Biol- Cham- 272-. 
27b7fl-B7tflSi Attisano et al- (HIS) Curr- Opin- Cell Biol- ID-. 
Iflfi-HMi Chen et al- tmt) Nature 3A3-, til-bib* Chen et al- 
30 (IWa) Nature 38^1 flS-SH Chen et al. (IWb) Proc Natl- 
Acad. Sci. USA i 12^36-12^31 Heldin et al- (HT7> Nature 
31Q-. MbS-i*?!; Hoodless et al. (mb) Cell AS-, MflT-SDD; 
Kretzschmar et al. (mfl) Curr- Opin- Genet- Dev. a-. 1D3-11H 
Kretzschmar et al- (1117) Genes Dev- 11-. 'ifiM-nSi Labbe et al. 
35 (ITO) Mol. Cell 2-. lD t l-12Di Lagna et al- (Hit) Nature 3A3-. 
fi32-fl3b; Liu et al. (lWa) Genes Dev- 11-. 3157-31b?i Liu et 
al- (IWb) Proc. Natl- Acad- Sci. USA lDbbT-lQ7b4 n tlaclas- 

Silva et al- (mb) Cell A7-. 1215-122Mn Nakao et al- (m?) 
EflBO J- lb-. 5353-53b2; Nishimura et al. (1^8) J. Biol. Chem. 
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273-. Ifl72-lfi7l! Souchelnytskyi et al- J- Biol. Chem. 

272! 2fllD7-2611Si Tsukazaki et al- (lUfi) Cell 15-. 77 c 1-7 c 1l! 
Urana et al. Nature 37D-, 341-3H7! Zhang et al- (1^7) 

Curr. Biol. 7-. 270-27L,; Zhang et al- (mfl) Nature 3TM-. ^m- 
5 ^13^ Zhou et al. (ITIfl) flol - Cell 2-, 121-127. 



Calcium 

The bivalent cation Ca 2+ is! along with cAMPi one of the two 
major second messengers in eukaryotic cells- Its intracellular 
10 concentration is tightly regulated and usually kept very low 
compared to the cellos environment. Ca 2+ binding proteins and 
transporters (Gap junction! Voltage-gatedi second messenger- 
gated) help to sequester huge amounts of the ion in various 
organelles from where Ca 2+ can be released upon extracellular 
15 stimuli. E.g. the contraction of the muscle is dependent on the 
presence of Ca 2+ ions which are readily transported back into the 
organelles in order for the muscle to relax. In signal 
transduction! Ca 2+ functions as a second messenger that activates 
Ca 2+ dependent processes through the activation of Ca 2+ /calmodulin • 
20 dependent protein kinases (CaM kinases) which are the major 
effector molecules of Ca 2+ . In the signaling cascades! the Call 
dependent kinases activate phosphol ipases (e.g. phospholipase C) 
that in return activate other protein kinases such as protein 
kinase C- 

Rab proteins 

In eukaryotic cells the compartmentalization of processes is 
a prerequisite for a tight regulation of processes and 
activities. The cells contain a highly dynamic set of membrane 
compartments that are responsible for packaging! sorting! 
secreting! and recycling proteins and other molecules- 
Trafficking between organelles within the secretory pathway 
occurs as vesicles derived from a donor compartment fuse with 
specific acceptor membranes! resulting in the directional 
transfer of cargo molecules- This process is tightly controlled 
by the Rab/Ypt family of proteins (reviewed by Novick and Zerial! 
m? )! a branch of the superfamily of small GTPases. Rab 
proteins regulate a variety of functions! including vesicle 
translocation and docking at specific fusion sites. Rabs may also 
play critical roles in higher order processes such as modulating 
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the levels of neurotransmitter release in neuronsn a likely 
mechanism in synaptic plasticity that underlies learning and 
memory (Geppert and SUdhof-i ma ). 

Small GTPases share a common three-dimensional fold that-p in 
5 the GTP bound state-i can bind a variety of downstream effector 
proteins- GTP hydrolysis leads to a conformational change in the 
"switch" regions that renders the GTPase unrecognizable to its 
effectors- In this wayn by localizing and activating a select set 
of effectors-! a common structural motif is used to control a wide 

10 array of distinct cellular processes- 

The final steps in membrane fusion are likely to be driven 
by a set of proteins known as SNAREs- After a vesicle becomes 
dockedi the cytoplasmic domains of VAMP (also termed 
synaptobrevin ) and syntaxin on opposing membranes-i in combination 

15 with a SNAP-25 molecule-t coalesce into an elongated -helical 
bundle (Poirier et al.i IT^fi i Sutton et al--i ma )i which may 
lead to fusion. Because numerous SNARE isoforms have been 
identified that localize to distinct membrane compartments! it 
was originally proposed that the specificity of interaction 

20 between the SNARE proteins accounted for the specificity in/ 
membrane trafficking- Recent resultsi however-, suggest that 
SNAREs are not specific in their ability to form complexes in 
vitro-i suggesting that trafficking specificity requires 
additional factors (Yang et al-i I'm )- In this regards Rab 

25 proteins are strong candidates for governing the specificity of 
vesicle trafficking- Like the SNAREs-i many isoforms (MO) of the 
Rab family have been identified that localize to specific 
membrane compartments (reviewed by Novick and Zeriali ITT? ). 

Concomitant with the SNARE cycle-i Rab proteins undergo a 

30 intricate cycle of membrane and protein interactions- Rabs are 
posttranslationally modified at C-terminal cysteines by the 
addition of. two geranylgeranyl groupsi which mediate membrane 
association when the Rab is in the GTP-bound state- After guanine 
nucleotide hydrolysis occursi the Rab is extracted from the 

35 membrane upon forming a complex with a cytosolic GDP-dissociation 
inhibitor (GDI). This cytosolic intermediate is then recycled 
onto a newly forming vesicle-i most likely through a secondary 
factor termed a GDI dissociation factor (GDF)i which displaces 
GDI • After the Rab becomes membrane bound-i a guanidine nucleotide 
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exchange factor (GEF) promotes release of GDP and the subsequent 
loading of GTP- In its GTP-bound conformations the Rab is then 
free to associate with its specific set of effectorsi which can 
in turn trigger events leading to the eventual fusion of the 
5 vesicle with a target membrane- To complete the cycle-i perhaps 
after or concurrent with membrane fusion-i a GTPase activating 
protein (GAP) accelerates nucleotide hydrolysisi switching off 
the GTPase- The remaining GDP-bound Rab can then participate in a 
new round of fusion. 

10 Rab interactions with effectors are likely to regulate 

vesicle targeting and membrane fusion in three ways- Firsts a Rab 
may specifically facilitate vectorial vesicle transport- Vesicles 
are transported from their site of origin to acceptor 
compartments likely through associations with cytoskeletal 

15 elements and transport motors- A protein has been identified with 
a domain structure that suggests a connection between the 
cytoskeleton and the Rabs- This protein-t called Rabkinesin-bi 
contains a kinesin-like ATPase motor domain followed by a coiled- 
coil stalk region and a RBD that specifically binds Rabb (Echard 

20 et al--i mfi ). An additional link with the cytoskeleton is 
provided by the Rab effector i Rabphilin-3A- Rabphilin-3A has been 
shown in vitro to interact with -actinini an actin-bundling 
protein-i but only when not bound to Rab3A (Kato et al-i )- 
These results raise the intriguing possibility that Rab proteins 

25 regulate vesicle interactions with the cytoskeleton and thereby 
play an active role in targeting vesicles to their appropriate 
destinations- 
Seconds Rab proteins may regulate membrane trafficking at 
the vesicle docking step- A number of Rab effectors-i including 

30 Rabaptin-S-i EEAl-i Rabphilin-3A-i and Rim-i may serve as molecular 
tethers- Each effector protein contains a RBDs followed by a 
linker region (some having the potential to form elongated 
coiled-coil structures)! and a domain capable of interacting with 
a second Rab or the target membrane- Rabaptin-Si for examples 

35 contains two RBDss one near the N terminus that specifically 
recognizes RabM and a second near the C terminus that binds RabS 
(Vitale et al-i l^fl )- Both Rim-i which is localized to the 
target membrane-i and Rabphilin-3Ai which is localized to the 
vesicle-i contain N-terminal RBDs and C-terminal CaE+-binding C5 
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domains-! implicating these effectors in synaptic vesicle 
localization or docking in response to Ca2+ influx (Uang et al-n 
1117 )• Tethering effectors may also recognize protein complexes 
on the acceptor membrane- Sec L »pi a yeast Rab3A homolog-i interacts 
5 with the exocyst (Guo et al--i nil )i a complex of seven or more 
subunits that is assembled at sites of vesicle fusion along the 
plasma membrane. The exocyst complex may therefore function as a 
landmark for Rab/ef f ector-mediated vesicle docking. 

Thirds once a vesicle has become tethered to its fusion 

10 site-i Rab proteins may selectively activate the SNARE fusion 
machinery. The mechanism of this activation is unknown but may 
involve direct interactions of Rabs ori more likely-i their 
effectors with SNAREs - For example-i Hrs-2 is a protein that binds 
to SNAP-25 and contains a Zn2+-finger motif characteristic of 

15 Rab-binding proteins such as Rabphilin-3A-i RinH EEAln and Noc2-i 
suggesting that Hrs-2 may form a physical link between Rabs and 
SNAREs (Bean et al-i 1117). In addition-i certain mutations in the 
syntaxin-binding protein Slylp-i the Seclp homolog utilized in ER 
to Golgi trafficking! eliminate the requirement for Yptlpi a Rab 

20 protein that functions at this trafficking step (Dascher et al-V 
1111 ). Rabs may therefore regulate SNARE associations through 
Seel family members- In support of this idea-i a Rab effector was 
recently found to interact with a vacuole Rab-i a Seclp homolog-i 
and a SNARE protein (Peterson et al.-i 1111 ) -i which suggests that 

25 this effector serves to connect Rab and SNARE function. In this 
way-i Rabs and their effectors may facilitate the correct pairing 
of SNAREs. 

References: Dascher et al- (mi). Mol - Cell- Biol- ll-i A75- 
flBSn Echard et al- (1118)- Science- 271-1 5fiO-Sa5i Geppert et al- 

30 (Ilia). Annu- Rev- Neurosci- 21i 75-15i Guoet al- (1111). EMBO J. 
l&i 1071-lDflOn Kato et al. (111b). J - Biol. Chem- 271-, 31775- 
31773^ Novick et al- (1117)- Curr. Opin. Cell Biol- 1, HU-SD4i 
Peterson et al. (1111). Curr- Biol- l-i 151-lb2^ Poirier et al. 
(lllfl). Nat. Struct- Biol- 5-. 7bS-7b1; Vitale et al. (1113). EMBO 

35 J. 17-, imi-HSln Uang et al- (1117). Nature- 3&&^ 513-SIBn Yang 
et al. (1111). J- Biol. Chem- 274-. 5b41-5l,53. 

Kinases 
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Reversible posttranslational modifications of proteins are 
major means of regulating cellular activities- Among the various 
modifications that are carried out by the cells! the addition of 
phosphoryl groups to Ser/Thr or Tyr residues is the most 
5 important and widely used. The phosphorylation of proteins is 
accomplished by protein kinases! while the reverse reaction! the 
removal of phosphoryl groups! is carried out by phosphatases - 
Kinases / Phosphatases regulate key positions e-g- in the 
processes of cell prolif eration-i differentiation and 

10 communication/signaling* These processes must be tightly 

regulated in order to maintain a steady state level of cellular 
fate- Mis-regulation of kinase activities (or that of 
phosphatases) is made responsible for a multitude of disease 
processes such as oncogenesis! inflammatory processes! 

15 arteriosclerosisn and psoriasis- 

Protein kinases constitute the largest protein family that 
is currently known- Several hundred kinases have been identified 
already. Classically! kinases are subdivided into two classes 
based on the amino acid residues in their substrates that are 

20 phosphorylated by the particular enzymes- The kinases 

specifically add phosphoryl groups from adenosine triphosphate 
(ATP) or-i less frequently! guanosine triphosphate (GTP>! either 
to serine and/or threonine or to tyrosine residues of substrate 
proteins. An estimated l^QDO to IQiDOO proteins present in a 

25 typical mammalian cell are believed to be regulated also by the 
action of protein kinases. 

Protein kinases are frequently integral parts of signaling 
cascades that transmit extracellular stimuli (e-g- hormones! 
neurotransmitters! growth- or differentiation factors) into the 

30 cell and result in various responses by the cells- The kinases 
play key roles in these cascades as they constitute a sort of 
L molecular switches" 1 turning on or off the activities of other 
enzymes and proteins! e.g. metabolic! regulatory! channels and 
pumps! receptors! cytoskeletal ! transcription factors- 

35 The regulation of kinase activities is accomplished by 

various means-" 

The best characterized example for the regulation via 
regulatory subunits is the cAMP-dependent protein kinase (PKA)' 
which is also a prototype for second messenger activated protein 
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kinases- This enzyme consists of a heterotetramer of two 
catalytic (C) and two regulatory (R) subunits. Upon binding of 
two molecules of second messenger (cAIIP) in each R subuniti the 
catalytic subunits are released and active- Both of the catalytic 
5 and the regulatory subunits several isoforms exist- The 

combination of catalytic and regulatory subunits determines the 
localization of the holoenzyme and also the substrate spectrum 
that is available for phosphorylation- The consensus pattern 
necessary to be present in the substrate for PKA action is RRXS/T 

10 where X can be any amino acid- 

The casein kinase II comprises another examples for 
holoenzymes that consist of catalytic and regulatory subunits. 
Other kinases that are activated by second messengers are cGNP- 
dependent protein kinase and Protein kinase C (PKC) which is 

15 activated by diacyiglycerol i which in turn is produced by 
phospholipases by cleavage of phosphatidylcholine. 

Receptor kinases usually consists of an extracellular domain 
which can bind effector molecules (e.g. growth factors and 
hormones) and transfer the stimulus to the intracellular domain 

20 of these proteins which usually is a protein tyrosine kinase. 
Other tyrosine kinases lack an extracellular domain but are 
associated with receptors which transfer the signal after 
effector binding by activating the associated protein kinase 
enzyme (e.g. Src kinase family; Srci B 1 k n Fgr-i Fyn-i Lck Lyn-i Yes 

25 and Janus kinase family; Jakl-3-» TykS). 

Dysfunction of kinasesi e.g. caused by non-functioning 
regulation! can be the cause of inflammatory diseases and 
uncontrolled proliferation- v-Src which is a truncated version of 
the C-Src protooncogene tyrosine kinase is a classical example 

30 for this process as v-Src does not contain the regulatory domain 
of the cellular gene and is thus const itutively active. 

Several categories of proteins are coded for by clones of 
the invention within the overall group of "Signal 
transduction"and include^ among others^ the following: 



35 



Discs- large family ' In Drosophila more than SO genes are 
discribed iin which mutation leads to loss of cell proliferation 
control indicating that they are tumor suppressor genes, (lost of 
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these genes have mammalian homologs- The Drosophila 'discs large r 
tumor suppressor protein-. Dig-, is the prototype of a family of 
proteins termed MAGUKs (membrane-associated guanylate kinase 
homologs). MAGUKs are localized at the membrane-cytoskeleton 
5 interface! usually at cell-cell junction-, where they appear to 
have both structural and signaling roles. They contain several 
distinct domains-i including a modified guanylate kinase domaini 
an SH3 motif-, and 1 or 3 copies of the DHR (GLGF/PDZ) domain- 
Recessive lethal mutations in the 'discs large' tumor suppressor 

10 gene interfere with the formation of septate junctions (thought 
to be the arthropod equivalent of tight junctions) between 
epithelial cells-, and they also cause neoplastic overgrowth of 
imaginal discs-, suggesting a role for cell junctions in 
proliferation control- These proteins can find application in 

15 modulating/blocking the guanylate cyclase-pathway - Clones in 
this category include: amy2_12d7. 

Protei ns with a bi 111 Domain : Proteins that contain a tilli) 
domain which has been originally described as a short conserved 

20 region in a number of unrelated proteins-, among them dystrophin-, 
the gene responsible for Duchenne muscular dystrophy. The domain-, 
which spans about 35 residuesv is repeated up to l 4 times in some 
proteins- It has been shown to bind proteins with particular 
proline-motif s-, EAPJ-P-P-EAPl-Y-, and thus resembles somewhat SH3 

25 domains. This domain is frequently associated with other domains 
typical for proteins in signal transduction processes. Examples 
of proteins containing the Ulil domain are Dystrophin-, Utrophin-, 
vertebrate YAP protein (binds the SH3 domain of the Yes 
oncoprotein) i murine NEDD-4 (embryonic development and 

30 differentiation of the central nervous system)-, IflGAP (human 
GTPase activating protein acting on ras) . Therefore these 
proteins should be involved in intracellular signal transduction. 
Diseases associated (as potentially diagnostic-, therapeutic-, 
causative-, and/or related-, etc..) with these proteins include as 

35 reported by ONIN 1) Muscular Dystrophy-, Pseudohypertrophic 

Progressive Duchenne and Becker Types (0HIN *31DEQ0). Clones in 
this category include: tes3_lldEl. 
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Ion-Transporters: For signalling stringent control od ion 
fluxes over biological membranes is of the essence- Several 
trans-membrane ion-chennel-proteins key elements of signal 
transduction pathways. Clones in this category include: amy2_lDp7 
5 and amy2_2flfl- 

RING-finoer proteins: A Zinc finger motif of the C3HCM type 
(the so-called RING finger domain) is involved in mediating 
protein-protein interactions. Proteins containing a RING-finger 
are: mammalian V(D)J recombination activating protein (RAGl)i 
10 mouse rpt-l-> human rfp-> human 52 Kd Ro/SS-A protein and others- 
The family of RING finger proteins contains a number of 
oncogenes- For example PMLi a probable transcription factor-i 
BRCAl-i the mammalian cbl- and bmi-1 proto-oncogenes- Clones in 
this category include: amy2_10hl?. 

15 Phosphatases : Proper targeting of PTPs is essential for many 

cellular signalling events including antigen induced 
proliferative responses of B and T cells- The physiological 
significance of PTPs is further unveiled through mice gene 
knockout studies and human genome sequencing and mapping 

20 projects- Several PTPs are shown to be critical in the 

pathogenesis of human diseasesn as shown by over STD entries in 
0MIN- Clones in this category include: tes3_31j20- 

Phosphoproteins : Some paraneoplastic syndromes affecting the 
nervous system are associated with antibodies that react with 

25 neuronal proteins and the causal tumor (onconeuronal antigens). 
Several of these antibodies are markers of specific neurologic 
syndromes associated with distinct types of cancer- One of the 
antigenes recognised by such antibodies is Ma-l-i the neuron- and 
testis-specif ic protein 1- The expression of rial mRNA is highly 

30 restricted to the brain and testis. Subsequent analysis suggested 
that Hal is likely to be a phosphoprotein (see OMIN *kDMDlQ). 
Clones in this category include: tes3_5k22. 

Transmembrane proteins 

Membrane region prediction was effected using the ALOMB 
35 software (Klein et al-i'H454 version 2 by K • Nakai). Similar to 
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many other methodsn the Kyte & Doolitle (ITflB) amino acid 
hydrophobicity scale is used in AL0M5 as the primary variable for 
classifying sequences in terms of their localization- High 
prediction accuracy is achieved through the system of intelligent 
decision rules and the utilization of a carefully selected 
training data set- The method also generates reliability 
estimates which makes it possible to distinguish between 
membrane-spanning proteins <Ii intrinsic) and globular proteins 
with regions of high hydrophobicity buried in the core- 

For a protein of length Li the block of length 2 with 
maximum hydrophobicity is found: 



where H± represents the hydrophobicity of an individual 
residue- 
Let P(I/maxH) and P(E/maxH) be the conditional probabilities 
that a protein is integral or peripheral respectively -i given its 
value of maximal hydrophobicity maxHn and let P(I) and P(E) be 
the prior probabilities of intrinsic and extrinsic membrane 
proteins estimated from the training set- Then a sequence is 
assigned to E if 

P(E/maxH) > P(I/maxH) 

or-i after applying the Bayes rulei 

P(E)P(maxH/E) > P(I)P(maxH/I) 

where the conditional probabilities P(maxH/E) and P(maxH/I) 
can be determined based on the estimates of probability 
distributions of maxH in both groups- 

Discriminant analysis allows to simplify this task by 
calculating the odds P (E/MaxH) : P (I/maxH) as e b i where Jb is the 
left-hand side of a linear or quadratic inequality- For examplei 
for the window of length 17i the protein is allocated to the 
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peripheral category E based on the empirically derived quadratic 
inequality: 



l-DSCmaxH^+lH-BOmaxH+l?-^ >□-, 

whereas the optimal inequality for assigning membrane 
5 proteins (category I) is linear: 

-T.02maxH + m.S? > □ 

The odds parameter can be made more or less stringent- For 
examples one can require odds at least 1:10 for a protein to be 
classified as integral. This leads to higher selectivity but less 
10 sensitivity. 

The boundaries of membrane-spanning regions in putative 
membrane proteins are detected by means of an iterative procedure 
whereby the most hydrophobic region corresponding to the value 
maxH is considered to be membrane and removed from the sequence- 

15 The classification procedure is then repeated again for the 

remaining sequence^ and-i if such a protein is again classified as 
integrali the next most hydrophobic region is considered- 
Reference: Kleini P.-» Kanehisa-i Pl.-i DeLisi-. C- (nflS) The 
detection and classification of membrane-spanning proteins. 

20 Biochem Biophys Acta 815: Mbfl-^fc. 

Transcription factors 

Purified eukaryotic RNA polymerase II is unable to initiate 
promoter-specific transcription. A family of factors that 
collectively confer RNAPII promoter specificity is known as the 
25 general transcription factors (GTFs). They include the TATA- 
binding Protein (TBP) TFIIB-, TFIIE-, TFIIF and TFI IH. These 
factors are conserved among all eukaryotes. 

RNAPII complexes containing the entire set of GTFs or a 
subset of GTFs together with other proteins have been isolated 
30 from mammalian and yeast cells. Although purified RNAPII and GTFs 
are sufficient for promoter-specific initiation-! this system 
fails to respond to activators. This is mediated by a further 
complex termed mediator complex which associates with the 
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carboxy-terminal heptapept ide domain (CTD) of the largest subunit 
of RNAPII- 



Purification of human RNAPII complexes resulted in two 
distinct forms of human RNAPII after analysis of functional 
5 properties. One complex contained chromatin remodeling activities 
but was devoid of GTFs- The other complex did not contain factors 
that modify chromatin but contained a subset of SRB/mediator 
subunits and GTFs and other polypeptides that mediate 
transcriptional activationi a scenario similar to that reported 
10 for yeast- 

A complex designated NAT (-20 SU) for negative regulator of 
transcription contains RNAPIIt Cdkfin homologs of the yeast 
mediator complex as well as Rgrl and SrblO/11 known as negative 
regulators of transcription - 

15 A complex with striking similar structural and functional 

properties to NAT has been identified designated SMCC (-15 SU) 
(SRB/mediator coactivator complex) i that can also mediate 
transcriptional activation- 

The SflCC complex includes all reported NAT subunits 
20 including subunits of the TRAP complex- TRAP is a coactivator 

complex isolated on the basis of its interaction with the thyroid 
hormone receptor- Another coactivator complex DRIPt isolated on 
the basis of its ability to interact with the vitamin D3 
receptor-i contains novel subunits as well as subunits of NAT/SMCC 
25 and TRAP complexes- 

The effects of each of these coactivator complexes is 
dependent on the TFIID complex- It is not known if the T AF 
subunits of TFIID are required- It is likely that new 
coactivator complexes will be uncovered containing both novel and 
30 previously defined components. 

Beside the huge amount of transcription factors which can be 
part of the RNAIIP holoenzyme or the coactivator complexes there 
is an even larger quantity of specific transcription factors 
binding to promoter elements within the DNA sequences of a given 
35 gene leading to activation or repression of transcription. A 

-523- 



WO 01/98454 PCT71B01/02050 
broad range of cellular responses like differentiation-! 
proliferation! cell death and others are elicited through 
activating or repressing the transcription of target genes- 

There are at least five superclasses of transcription 
5 factors: 

IL Superclass qontaips mgrnfreps with characteristic b^sjc; 

domains : 

Members are: 

Leucine zipper factors-i where the basic domain is followed 
10 by a leucine zipper of repeated leucine residues at every seventh 
position. The zipper mediates protein dimerization as a 
prerequisite for DNA-binding- 

Helix-loop-helix factors (bHLH) contain a DNA-binding basic 
region followed by a motif of two potential amphipathic alpha- 
15 helices connected by a loop of variable length also mediating 
dimerization - 

Factors with a combination of Helix-loop-helix and leucine 
zipper- 
Further members of this superclass are NF-li RF-X-i and bHSH 
20 like proteins. 

5- Superclass comprises factors containing zinc-coordinating 
DNA-frindjng chains. 

Members are: 

Proteins with CysiJ zinc finger of nuclear receptor type-i 
25 where two such motifs differing in size-i composition and function 
are present in each receptor molecule- Each finger comprises l 4 
cysteine residues coordinating one zinc ion- The second half 
including the second cysteine pair has alpha-helix conformation 
and the helix of the first finger binds to the DNA through the 
30 major groove. The sequence between the first two cysteines of the 
second finger mediates dimerization upon DNA-binding- This class 
includes the steroid hormone receptors and the thyroid hormone 
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receptor-like factors. Other diverse cysM zinc fingers have a 
motif of GATA-type - 



Proteins with CysEHisE zinc finger domain(s). Each finger 
comprises E cysteine and E histidine residues coordinating one 
5 zinc ioni and in some cases one histidine is replaced by another 
cysteine. The zinc ion is essential for DNA-binding - 

Proteins with Cyst cysteine-zinc cluster (s). Six cysteine 
residues coordinate two zinc ionsn i. e. two of the thiol groups 
are coordinating two zinc ions each* Present in many fungal 
10 regulators. 

Zinc fingers of alternating composition. 

3. Superclass contains factors of helix-turn-helix type. 

Members are: 

Proteins with homeo domains- Homeo domains are three 
15 consecutive alpha-helix structures. Helix 3 contacts mainly the 
major groove of the DNAt some contacts at the minor groove are 
observed as well. Helix E and 3 resemble the helix-turn-helix 
structure of prokaryotic regulators. 

Proteins with Paired box domain(s). This is a DNA-binding 
20 domain of approximately 130 amino acid residues. Its N-terminal 
half is basici its C-terminal half is highly charged in general. 
It probably comprises 3 alpha-helices - 

Proteins with Fork head / winged helix domain(s). This 
domain was identified by homology between HNF-3A and fkh- The 
25 domain comprises approx. 110 AA. Analysis of the crystal 
structure has revealed a compact structure of three alpha- 
helicesn the third alpha-helix being exposed towards the major 
groove of the DNA- The domain also exerts minor groove contacts- 
Upon binding to DNA-i it induces a bend of 13 degree. 

30 Heat shock factors 

Proteins with Tryptophan clusters. The tryptophan clusters 
comprise several tryptophan residues with a spacing of IE-El 
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amino acid residues^ the subclass of myb-type DNA-binding domains 
typically exhibit a spacing of n-21 amino acid residues- 



Proteins with TEA domain(s)- The TEA domain has been 
identified as a region which is conserved among the transcription 
5 factors TEF-1-. TEC1 and abaA. This domain in TEF-1 has been shown 
to interact with DNAi although two additional regions may also 
contribute to DNA-binding- It is predicted to fold into three 
alpha-helicesi with a randomly coiled region of Ib-lfl amino acid 
residues between helices 1 and 2i and a short stretch between 
10 helices 2 and 3 of 3-fi residues- 

!L Superclass contains beta-Scaffold Factors with Minor 

firooye Contacts 

Members are: 

Proteins with RHR (Rel homology) region- 

15 The structure of the Rel-type DBD exhibits a bipartite 

subdomain structure! each subdomain comprising a beta-barrel with 
five loops that form an extensive contact surface to the major 
groove of the DNA- Particularly! the first loop of the N-terminal 
subdomain (the highly conserved recognition loop) performs 

20 contacts with the recognition element on the DNA-i but other loops 
are involved. The fact that the main DNA-contacts are made 
through loops has been suggested to provide a high degree of 
flexibility in binding to a range of different target sequences- 
Augmenting interactions are achieved by two alpha-helices within 

25 the N-terminal Part that form strong minor groove contacts to the 
A/T-rich center of the B-element- In pb5n the sequence between 
both alpha-helices is much shorter and even helix 2 is truncated- 
The second-* C-terminal domain is necessary mainly for protein 
dimerization. 

30 pS3 proteins 

MADS (MCMl-agamous-def iciens-SRF) box proteins- Proteins of 
this class comprise a region of homology- The DNA-binding domain 
also comprises the dimerization capability- In the DNA-bound 
dimer (shown for SRF)n two antiparallel amphipathic alpha-helices 
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(alpha-I)i form a coiled coil and are oriented approximately 
parallel on the minor groove. These helices make minor and major 
groove contacts-i the N-terminal extensions form minor groove 
contacts. The bound DNA is bent and wrapped around the protein. 
It exhibits a compressed minor groove in the center and widened 
minor groove in the flanks. 

Beta-Barrel alpha-helix transcription factors- 

TATA-binding proteins 

HUG proteins 

Proteins of this class comprise a region of homology with 
the chromosomal non-histone WIG proteins such as HMG1. This 
region comprises the DNA-binding domain which in some instances 
such as HflGl mediates sequence-unspecif ic-i in other cases such 
LEF-1 sequence-specific binding to DNA • This domain exhibits a 
typical L-shaped conformation made up of 3 alpha-helices and an 
extended N-terminal extension of the first helix- The latter 
together with helix In which contains a kinki form the long arm 
of the L-» whereas helices 1 and 2 form the short arm. Binding to 
the minor groove induces a sharp bending of the DNA by more than 

degreei away from the bound protein. The overall topology of 
the DNA-protein complexes resembles somewhat that of the TBP-TATA 
box complex. 

Heteromeric CCAAT factors 

Proteins with Grainyhead domain(s) 

Cold-shock domain factors. Cold-shock domain proteins are 
characterized by a highly conserved region first found in 
prokaryotic cold-shock proteins. This domain is a single-stranded 
nucleic acid-binding structure interacting with DNA or RNA • It 
consists of an antiparallel five-stranded beta-barreli the 
strands of which are connected by turns and loops. Within this 
structure-! a three-stranded beta-strand contains a conserved RNA- 
binding motif-i RNP1. Not all CSD proteins are transcription 
factors. Those which specifically bind to a certain sequence are 
termed Y-box proteins- Proteins of this class were previously 
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called protamine-like domain proteins because of having a highly 
positively charged domain with interspersed proline residues. 



Proteins with Runt homology domain 

The members of this transcription factor class have been 
5 identified on the basis of their homology to a defined region 
within the Drosophilia protein Runt. The runt domain is part of 
the DNA-binding domain of these factors- It consists mainly of 
beta-strandsn does not contain alpha-helical regions and seems to 
be most similar to the palm domain found in DNA polymerase beta 
10 (rat). 

5. Superclass contains other transcription factors like 
Copper fist proteins-. HHSI(Y)i STAT! Pocket domain proteins and 
Apg/EREBP-related factors. 

The classification of transcription factors originates from 
15 TRANSFAC database: 

http: //transf ac - gbf - de /TRANSFAC/ 

Reference : Heinemeyer 

Several categories of proteins are coded for by clones of 
the invention within the overall group of u Transcr iption Factors" 
20 and include! among othersi the following: 

Homeobox -proteins : Homeodomain-containing transcription 
factors are essential for a variety of processes in vertebrate 
development! including organogenesis- They have been shown to 

25 regulate cell proliferation! pattern segmental identity 

anddetermine cell fate decisions during embryogenesis • For 
example! In zebrafish emxE mRNAs are found in the dorsal 
telencephalon! parts of the diencephalon and the otocyst- The 
human homologue Emx2 appears to be already expressed in fi.S day 

30 embryos. It is also expressed in the presumptive cerebral cortex! 
olfactory bulbs! in some neuroectodermal areas in embryonic head 
including olfactory placodes in earlier stages and olfactory 
epithelia later in development. Mutants of the D- melanogaster 
gene "rnempty spiracles" display spiracles devoid of filzkorper! 
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no antenna and an open head- Clones in this category include: 
amy2_l^nilb - 

Proteins with mvc-tvpei helix-loop-helix dimer ization domain 
signature(s) - This helix-loop-helix domain mediates protein 
5 dimerization has been found in various multimeric transcrpition 
factors. Clones in this category include: tes3_lflnm. 

Transcriptional silencers : In addition to transcription 
factorsi other proteinsi such as YDL153c of Saccharomyces 
cerevisia are responsible for silencing of genes. Clones in this 
10 category include: amy2_2f22- 

Proteins regulating transcription factors: The activity of 
several transcription factor is regulated by the binding or 
dissociation of other proteins or by phosphorylation or * 
dephosphorylation of the transcription factor. For example-,1- 
15 kappa-B-related protein interacts with the transcription factor 
NF-kB. I-kappa-B-alpha mutations contribute to constitutive NF- 
kappaB activity in cultured and primary HRS (Hodgkin/Reed- 
Sternberg) cells and are therefore involved in the pathogenesis 
of Hodgkin r s disease (HD> patients- Clones in this category 
20 include: amy2_lcl2- 

Signal transducing proteins : Beta-transducin subunits of G- 
proteins contain UD-MD repeats. The beta subunits seem to be 
required for the replacement of GDP by GTP as well as for 
membrane anchoring and receptor recognition. Due to the zinc 
25 finger the novel protein seems to be a new molecule involved in 
signal transduction and transcription- These proteins have been 
reported by OPIIN to be associated (as potentially diagnostic-, 
therapeutic-, causativen and/or relatedi etc..) with the following 
diseases: 1) essential hypertension (OMIN *13 I 113D). Clones in 
30 this category include: tes3_llcE2. 

* * * 

The invention-, therefore-, specifically contemplates the 
following assemblages of materials-, which track the above- 
identified fourteen functional groupings-, that are useful in 
35 practicing the profiling aspects of the invention- One type of 
assemblage is nucleic acid-based and can include the following 
groupings of sequences and their derivatives: all sequences; 
human fetal brain sequences^ brain derived sequences^ human fetal 



-529- 



WO 01/98454 PCT71B01/02050 
kidney library sequences! kidney derived sequences! human mammary 
carcinoma library sequences 2 ; mammary carcinoma derived sequences! 
human testis library sequences! testes derived sequences! cell 
cycle genesn cell structure and motility genes! differentiation 
5 and development genesn intracellular transport and trafficking 
genesi metabolism genesi nucleic acid management genes! signal 
transduction genesi transmembrane protein genesn and 
transcription factor genes. Other assemblages contain proteins 
or their corresponding antibodies or antibody fragments! divided 
10 along the same groupings. 

Database Applications 

Because they are human genes and gene products! the 
inventive molecules are useful as members of a database- Such a 
database may be used! for example! in drug discovery and 

15 rationale drug design or in testing the novelty and non- 
obviousness of newly sequenced materials- In addition! they are 
particularly suited in designing variants for the profiling (and 
other) applications described herein- Hence! the following 
discussion of electronic embodiments applies equally to such 

20 variants-i which! naturally! will be generated and stored using a 
computer using known methodologies. 

Accordingly! one aspect of the invention contemplates a 
database of at least one of the inventive sequences stored on 
computer readable media- Again! the individual sequences may be 

25 grouped with regard to the individual functional and structural 
groups mentioned above. While the individual sequences of a 
database may exist in printed form-* they are preferably in 
electronic form! as in an ascii or a text file. They may also 
exist as word processing files or they may be stored in database 

30 applications like DBSt Sybase! Oracle! GCG and GenBank- One 
skilled in the art will understand the range of applications 
suitable for using and storing the electronic embodiments of the 
invention. 

"Computer readable media" refers to any medium which can be 
35 read and accessed by a computer. These include: magnetic storage 
mediai like floppy discs-i hard drives and magnetic tape! optical 
storage media-, like CD-R0m electrical storage media-, like RAM 
and ROfh and hybrids of these categories! like magnetic/optical 
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storage media. One skilled in the art will readily understand 
the scope of computer readable media and how to implement them- 



Biological Activities and Assays for Implementing Therapeutic and 
Diagnostic Applications 

5 This section provides assays for biological activity that 

are useful in characterizing and quantifying the biological 

activity of the inventive molecules and their derivatives-! which 

is relevant to the pharmacological effects of the inventive 

molecules. As used in this sectioni it will be understood that 

10 "protein 11 may also refer to the inventive antibodies (including 

fragments). 

Cytokine and Cell Proliferation/Differentiation Activity 

A protein of the present invention may exhibit cytokine-i 
cell proliferation (either inducing or inhibiting) or cell 

15 differentiation (either inducing or inhibiting) activity or may 
induce production of other cytokines in certain cell populations. 
Many protein factors discovered to daten including all known 
cytokines-i have exhibited activity in one or more factor 
dependent cell proliferation assays-* and hence the assays serve 

20 as a convenient confirmation of cytokine activity- The activity 
of a protein of the present invention is evidenced by any one of 
a number of routine factor dependent cell proliferation assays 
for cell lines including-, without limitation-. 32D-, DA5-i DA1G-. 
TlOn BT-, B1/11-, BaF3i MC1/G-, PI + (preB M + ) -» EES-. RB5-i DA1-, 123i 

25 T11L5-. HTB^ CTLLE-i TF-1-, Plo7e and CPIK- 

The activity of a protein of the invention may-, among other 
means-, be measured by the following methods: 

Assays for T-cell or thymocyte proliferation include without 
limitation those described in: Current Protocols in Immunology x 

30 Ed by J. E . Coligan-. A- PI- Kruisbeek-. D - H. Plargulies-, E . PI- 

Shevach-i U. Strober-i Pub. Greene Publishing Associates and Ui ley- 
Interscience (Chapter 3-i In Vitro assays for Plouse Lymphocyte 
Function 3.1-3.111 Chapter 7-i Immunologic studies in Humans)} 
Takai et al-i J. Immunol. 137:3»nM-35D[]-. llflbn Bertagnolli et 

35 al-i J- Immunol. lHS:17Db-1712-i l^Oi Bertagnolli et al-i 

Cellular Immunology 133:32?-3m-i l^li Bertagnolli-, et al-i I. 
Immunol. m c l:377fl-37a3i 1112; Bowman et al-i I- Immunol- 
152:17Sb-17bl-. 
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Assays for cytokine production and/or proliferation of 
spleen cells-, lymph node cells or thymocytes include-i without 
limitation-, those described in: Polyclonal T cell stimulation-, 
Kruisbeek-, A- M. and Shevach-, E- PI- In Current Protocols in 
5 Immunology- J. E - e-a- Coligan eds. Vol 1 pp- 3. 12-1-3. IS. m-, 
John tililey and Sons-, Toronto, lim? and Measurement of mouse and 
human interleukin gamma -, Schreiber-, R • D - In Current Protocols 
in Immunology. J. E. e-a- Coligan eds- Vol 1 pp. b.fl.l-b-fl-fl-, 
John Uiley and Sons-, Toronto. 11TH . 

10 Assays for proliferation and differentiation of 

hematopoietic and lymphopoietic cells include-, without 
limitation-, those described in: Measurement of Human and Murine 
Interleukin H and Interleukin ^ Bottomly-, K.-, Davis-, L- S. and 
Lipsky-» P. E- In Current Protocols in Immunology. J- E- e-a. 

15 Coligan eds. Vol 1 pp- b.3-l-t .3-15-, John Uiley and Sons-, 

Toronto. 1111? deVries et al-i J • Exp. Med. 173:12DS-1211-, ITOi 
Moreau et al.-, Nature 33b : blD-b!2-, llfifl? Greenberger et al.-, 
Proc Natl. Acad- Sci- U-S-A- &Q :2131-213fl-, 1163? Measurement of 
mouse and human interleukin b-Nordan-» R. In Current Protocols in 

20 Immunology. J- E- e.a. Coligan eds- Vol 1 pp. b.b-l-b.b-5-, John 
Uiley and Sons-, Toronto. 1111? Smith et al--, Proc- Natl. Aced- 
Sci- U.S.A. fl3:lfl57-lflbl-, 116b? Measurement of human Interleukin 
11-Bennett-, F . Giannotti-, J--, Clark-, S. C. and Turner-, K. J- In 
Current Protocols in Immunology. J. E- e-a. Coligan eds- Vol 1 

25 pp. b.15-1 John Uiley and Sons-, Toronto- 1111? Measurement of 
mouse and human Interleukin 1-Ciarletta-, A--, Giannottin J . -i 
Clark-, S. C and Turner-, K- J. In Current Protocols in 
Immunology. J • E- e.a- Coligan eds. Vol 1 pp. b-13.li John Uiley 
and Sons-, Toronto. 1111. 

30 Assays for T-cell clone responses to antigens (which will 

identify-, among others-, proteins that affect APC-T cell 
interactions as well as direct T-cell effects by measuring 
proliferation and cytokine production) include-, without 
limitation! those described in: Current Protocols in Immunology-, 

35 Ed by J- E. Coligan-, A • M. Kruisbeek-, D - H. Margulies-i E. M - 

Shevach-i U Strober-, Pub- Greene Publishing Associates and Uiley- 
Interscience (Chapter 3-» In Vitro assays for Mouse Lymphocyte 
Function? Chapter b-, Cytokines and their cellular receptors? 
Chapter ?•» Immunologic studies in Humans)? Ueinberger et al.-, 
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Proc Natl- Acad- Sci- USA 77 : faMl-bMS-i IWD; Weinberger et al-n 

Eur- J. Immun- ll:MDS-mi-. MflU Takai et al-i J - Immunol. 

l^^^-BSDO-. nab. Takai et al-i J * Immunol- mQ:SDfl-S15-, nflfi- 

Immune Stimulating or Suppressing Activity 
5 A protein of the present invention may also exhibit immune 

stimulating or immune suppressing activity! including without 
limitation the activities for which assays are described herein. 
A protein may be useful in the treatment of various immune 
deficiencies and disorders (including severe combined 

10 immunodeficiency <SCID))n e-g-! in regulating (up or down) growth 
and proliferation of T and/or B lymphocytes^ as well as effecting 
the cytolytic activity of NK cells and other cell populations. 
These immune deficiencies may be genetic or be caused by vital 
(e.g.! HIV) as well as bacterial or fungal infections! or may 

15 result, from autoimmune disorders- More specifically! infectious 
diseases causes by viral-i bacterial fungal or other infection 
may be treatable using a protein of the present invention! 
including infections by HIVt hepatitis viruses! herpesviruses! 
mycobacteria! Leishmania spp-! malaria spp. and various fungal 

20 infections such as candidiasis- Of course! in this regard! a 

protein of the present invention may also be useful where a boost 
to the immune system generally may be desirable! i-e-! in the 
treatment of cancer- 

Autoimmune disorders which may be treated using a protein of 

25 the present invention include! for example! connective tissue 
disease! multiple sclerosis! systemic lupus erythematosus! 
rheumatoid arthritis! autoimmune pulmonary inflammation! 
Guillain-Barre syndrome! autoimmune thyroiditis! insulin 
dependent diabetes mellitisi myasthenia gravis! graf t-versus-host 

30 disease and autoimmune inflammatory eye disease- Such a protein 
of the present invention may also to be useful in the treatment 
of allergic reactions and conditions! such as asthma 
(particularly allergic asthma) or other respiratory problems- 
Other conditions! in which immune suppression is desired 

35 (including! for example! organ transplantation)! may also be 
treatable using a protein of the present invention- 
Using the proteins of the invention it may also be possible 
to modify immune responses! in a number of ways- Down regulation 
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may be in the form of inhibiting or blocking an immune response 
already in progress or may involve preventing the induction of an 
immune response. The functions of activated T cells may be 
inhibited by suppressing T cell responses or by inducing specific 
5 tolerance in T cellsn or both- Immunosuppression of T cell 

responses is generally an activei non-antigen-specif ici process 
which requires continuous exposure of the T cells to the 
suppressive agent. Tolerancei which involves inducing non- 
responsiveness or anergy in T cellsi is distinguishable from 

10 immunosuppression in that it is generally antigen-specific and 
persists after exposure to the tolerizing agent has ceased- 
Operationally t tolerance can be demonstrated by the lack of a T 
cell response upon reexposure to specific antigen in the absence 
of the tolerizing agent- 

15 Down regulating or preventing one or more antigen functions 

(including without limitation B lymphocyte antigen functions 
(such asi for examplei B7))-i e-g--. preventing high level 
lymphokine synthesis by activated T cellsi will be useful in 
situations of tissue-i skin and organ transplantation and in 

20 graf t-versus-host disease (GVHD) - For example-i blockage of T cell 
function should result in reduced tissue destruction in tissue 
transplantation. Typically-i in tissue transplants! rejection of 
the transplant is initiated through its recognition as foreign by 
T cellsT followed by an immune reaction that destroys the 

25 transplant. The administration of a molecule which inhibits or 
blocks interaction of a B7 lymphocyte antigen with its natural 
ligand(s) on immune cells (such as a solublei monomeric form of a 
peptide having B7-2 activity alone or in conjunction with a 
monomeric form of a peptide having an activity of another B 

30 lymphocyte antigen (e.g.i B7-ln B7-3) or blocking antibody) n 

prior to transplantation can lead to the binding of the molecule 
to the natural ligand(s) on the immune cells without transmitting 
the corresponding costimulatory signal- Blocking B lymphocyte 
antigen function in this matter prevents cytokine synthesis by 

35 immune cells-i such as T cells-i and thus acts as an 

immunosuppressant. Moreover! the lack of costimulation may also 
be sufficient to anergize the T cells-i thereby inducing tolerance 
in a subject. Induction of long-term tolerance by B lymphocyte 
antigen-blocking reagents may avoid the necessity of repeated 
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administration of these blocking reagents- To achieve sufficient 
immunosuppression or tolerance in a subjecti it may also be 
necessary to block the function of a combination of B lymphocyte 
antigens • 

5 The efficacy of particular blocking reagents in preventing 

organ transplant rejection or GVHD can be assessed using animal 
models that are predictive of efficacy in humans. Examples of 
appropriate systems which can be used include allogeneic cardiac 
grafts in rats and xenogeneic pancreatic islet cell grafts in 

10 micei both of which have been used to examine the 

immunosuppressive effects of CTLAmg fusion proteins in vivo as 
described in Lenschow et al-i Science 257 : TBT-TTE (niE) and 
Turka et al--. Proc Natl. Acad. Sci USA-. AT : 111D2-11105 (111B)'* 
In addition-i murine models of GVHD (see Paul ed-T Fundamental 

15 Immunology-i Raven Press-i New York-i lTST-i pp. Afb-fiM?.) can be used 
to determine the effect of blocking B lymphocyte antigen function 
in vivo on the development of that disease. 

Blocking antigen function may also be therapeutically useful 
for treating autoimmune diseases. Many autoimmune disorders are 

20 the result of inappropriate activation of T cells that are 

reactive against self tissue and which promote the production of 
cytokines and autoantibodies involved in the pathology of the 
diseases. Preventing the activation of autoreactive T cells may 
reduce or eliminate disease symptoms- Administration of reagents 

25 which block costimulation of T cells by disrupting 

receptor s ligand interactions of B lymphocyte antigens can be used 
to inhibit T cell activation and prevent production of 
autoantibodies or T cell-derived cytokines which may be involved 
in the disease process. Additionally i blocking reagents may 

30 induce antigen-specific tolerance of autoreactive T cells which 
could lead to long-term relief from the disease. The efficacy of 
blocking reagents in preventing or alleviating autoimmune 
disorders can be determined using a number of well-characterized 
animal models of human autoimmune diseases* Examples include 

35 murine experimental autoimmune encephalitis-i systemic lupus 
erythmatosis in MRL/lpr/lpr mice or NZB hybrid mice-i murine 
autoimmune collagen arthritis-i diabetes mellitus in NOD mice and 
BB ratsi and murine experimental myasthenia gravis (see Paul ed.T 
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Fundamental Immunology! Raven Press-i New York! lTflT-i pp. AMD- 

Upregulation of an antigen function (preferably a B 
lymphocyte antigen function)! as a means of up regulating immune 
5 responses! may also be useful in therapy- Upregulation of immune 
responses may be in the form of enhancing an existing immune 
response or eliciting an initial immune response- For example! 
enhancing an immune response through stimulating B lymphocyte 
antigen function may be useful in cases of viral infection- In 

10 addition! systemic viral diseases such as influenza-i the common 
coldi and encephalitis might be alleviated by the administration 
of stimulatory forms of B lymphocyte antigens systemically • 

Alternatively! anti-vital immune responses may be enhanced 
in an infected patient by removing T cells from the patient! 

15 costimulating the T cells in vitro with viral antigen-pulsed APCs 
either expressing a peptide of the present invention or together 
with a stimulatory form of a soluble peptide of the present 
invention and reintroducing the in vitro activated T cells into 
the patient. Another method of enhancing anti-viral immune 

20 responses would be to isolate infected cells from a patient! 
transfect them with a nucleic acid encoding a protein of the 
present invention as described herein such that the cells express 
all or a portion of the protein on their surface! and reintroduce 
the transfected cells into the patient. The infected cells would 

25 now be capable of delivering a costimulatory signal to! and 
thereby activate! T cells in vivo. 

In another application! up regulation or enhancement of 
antigen function (preferably B lymphocyte antigen function) may 
be useful in the induction of tumor immunity. Tumor cells (e-g-! 

30 sarcoma! melanoma! lymphoma! leukemia! neuroblastoma! carcinoma) 
transfected with a nucleic acid encoding at least one peptide of 
the present invention can be administered to a subject to 
overcome tumor-specific tolerance in the subject- If desired! the 
tumor cell can be .transfected to express a combination of 

35 peptides. For example! tumor cells obtained from a patient can be 
transfected ex vivo with an expression vector directing the 
expression of a peptide having 87-2-like activity alone! or in 
conjunction with a peptide having B7-l-like activity and/or B7-3- 
like activity. The transfected tumor cells are returned to the 
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patient to result in expression of the peptides on the surface of 
the transfected cell. Alternatively-! gene therapy techniques can 
be used to target a tumor cell for transfection in vivo. 

The presence of the peptide of the present invention having 
5 the activity of a B lymphocyte antigen(s) on the surface of the 
tumor cell provides the necessary costimulation signal to T cells 
to induce a T cell mediated immune response against the 
transfected tumor cells- In addition! tumor cells which lack MHC 
class I or MHC class II molecules! or which fail to reexpress 

10 sufficient mounts of HHC class I or MHC class II molecules! can 
be transfected with nucleic acid encoding all or a portion of 
(e.g.-i a cytoplasmic-domain truncated portion) of an MHC class I 
alpha chain protein and beta 2 microglobulin protein or an P1HC 
class II alpha chain protein and an MHC class II beta chain 

15 protein to thereby express MHC class I or MHC class II proteins 
on the cell surface. Expression of the appropriate class I or 
class II MHC in conjunction with a peptide having the activity of 
a B lymphocyte antigen (e.g.! B7-1t B7-Hn B7-3) induces a T cell 
mediated immune response against the transfected tumor cell- 

20 Optionally! a gene encoding an antisense construct which blocks 
expression of an MHC class II associated protein! such as the 
invariant chain! can also be cotransf ected with a DNA encoding a 
peptide having the activity of a B lymphocyte antigen to promote 
presentation of tumor associated antigens and induce tumor 

25 specific immunity. Thusi the induction of a T cell mediated 

immune response in a human subject may be sufficient to overcome 
tumor-specific tolerance in the subject. 

The activity of a protein of the invention may-i among other 
means-i be measured by the following methods: 

30 Suitable assays for thymocyte or splenocyte cytotoxicity 

includei without limitation! those described in: Current 
Protocols in Immunology! Ed by J. E - Coligan! A- M* Kruisbeeki P. 
H- Margulies! E- M- Shevachn U- Strober! Pub. Greene Publishing 
Associates and Uiley-Interscience (Chapter 3i In Vitro assays for 

35 Mouse Lymphocyte Function 3.1-3.11! Chapter 7t Immunologic 

studies in Humans>! Herrmann et al-i Proc- Natl. Acad. Sci- USA 
7a:SMflfl-EM c lE! nflli Herrmann et al-i J • Immunol. 123:1^3-11711! 
na2 # ! Handa et al-i J- Immunol- 135:15^4-1572-. HflS; Takai et 
al-i I. Immunol. 13?:3M e JM-3SD0-. llflb! Takai et al-i J- Immunol- 
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mo : 503-512-1 1188=; Herrmann et al.-i Proc- Natl. Acad. Sci- USA 
78:2463-2412! 1181=, Herrmann et al-i J- Immunol- 128 : nbfl-n?M ! 
1182^ Handa et al-i J • Immunol- 135 : 15b4-1572! 1185! Takai et 
al.-i J- Immunol- 137 : 3414-3500! 118b; Bowmanet al.! J- Virology 
5 tlrlW-mfli Takai et al-i J. Immunol- 140:508-512! 1188^ 

Bertagnolli et al.-i Cellular Immunology 133:327-341! HIH Brown 
et al.-i J. Immunol. 153:3071-3012! m. 

Assays for T-cell-dependent immunoglobulin responses and 
isotype switching (which will identify! among others! proteins 

10 that modulate T-cell dependent antibody responses and that affect 
Thl/Th2 profiles) include! without limitation! those described 
in: flaliszewski t J. Immunol- 144:3028-3033! 1110; and Assays for 
B cell function: In vitro antibody production! Mondn J. J ■ and 
Brunswick! PI- In Current Protocols in Immunology. J. E. e.a- 

15 Coligan eds- Vol 1 pp. 3.fi-l-3.fl-lt-i John Wiley and Sons, 
Toronto- 1114 - 

Mixed lymphocyte reaction ( MLR ) assays (which will identify-! 
among others! proteins that generate predominantly Thl and CTL 
responses) includei without limitation! those described in: 

20 Current Protocols in Immunology! Ed by J- E - Coligan-i A- PI- 
Kruisbeeki D- H- Plargulies-* E. M- Shevachi W- Strober! Pub. 
Greene Publishing Associates and Idiley-Interscience (Chapter 3i 
In Vitro assays for House Lymphocyte Function 3-1-3-lln Chapter 
7i Immunologic studies in Humans); Takai et al-t J • Immunol- 

25 137:3414-3500! 118b! Takai et al.-i J- Immunol- 140:508-512! 1188} 
Bertagnolli et al-i J- Immunol- 141 : 3778-3783! 1112. 

Dendritic cell-dependent assays (which will identify! among 
others! proteins expressed by dendritic cells that activate naive 
T-cells) include! without limitation! those described in: Guery 

30 et al-! J- Immunol. 134 : 53b-S44 ! 1115^ Inaba et al-i Journal of 
Experimental Medicine 173 : 541-551! ITOi Placatonia et al-i 
Journal of Immunology 154:5071-5071! ITUi Porgador et al-i 
Journal of Experimental Medicine 182:255-2b0! 1115; Nair et al-i 
Journal of Virology b?:40b2-4Dbl! 1^3; Huang et al., Science 

35 2b4:1bl-1b5! 1114; Macatonia et al-! Journal of Experimental 
Medicine Ibl :125S-12b4! 1181; Bhardwaj et al-! Journal of 
Clinical Investigation 14:7=17-307! 1114; and Inaba et al.! 
Journal of Experimental Medicine 172 : b31-L,4D! 1110 - 
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Assays for lymphocyte survi val/apoptosis (which will 
identify-i among othersn proteins that prevent apoptosis after 
superantigen induction and proteins that regulate lymphocyte 
homeostasis) include^ without limitation! those described in: 
5 Darzynkiewicz et al-i Cytometry 13 : 7^5-603 -i Mi Gorczyca et 
al.-i Leukemia 7 : bST-bTD-i 1^3*-* Gorczyca et al-i Cancer Research 
S3:nMS-nSli ni3i Itoh et al-n Cell bb : H33-EM3-I IW, 
Zacharchuk-. Journal of Immunology IMS : L *D37-MDMS-i ITOi Zamai et 
al-n Cytometry l^flll-fl^?! 1^3^ Gorczyca et al.i International 
10 Journal of Oncology 1 : bST-bMfi-i IIIE. 

Assays for proteins that influence early steps of T-cell 
commitment and development includei without limitation-! those 
described in: Antica et al.-i Blood fiM : 111-11? i mm Fine et al.i 
Cellular Immunology 1SS:111-1SSi Mim Galy et al.T Blood 
15 flS:2770-B77fi-, TO; Toki et al-i Proc Nat. Acad Sci- USA 
flfl:7SMfl-7SSli MIL 
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Hematopoiesis Regulating Activity 

A protein of the present invention may be useful in 
regulation of hematopoiesis andi consequently-, in the treatment 
of myeloid or lymphoid cell deficiencies- Even marginal 
5 biological activity in support of colony forming cells or of 
factor-dependent cell lines indicates involvement in regulating 
hematopoiesis-! e.g. in supporting the growth and proliferation of 
erythroid progenitor cells alone or in combination with other 
cytokines-! thereby indicating utility-i for examples in treating 

10 various anemias or for use in conjunction with 

irradiation/chemotherapy to stimulate the production of erythroid 
precursors and/or erythroid cells=i in supporting the growth and 
proliferation of myeloid cells such as granulocytes and 
monocytes/macrophages (i-e-i traditional CSF activity) usefuli 

15 for examples in conjunction with chemotherapy to prevent or treat 
consequent myelo-suppression; in supporting the growth and 
proliferation of megakaryocytes and consequently of platelets 
thereby allowing prevention or treatment of various platelet 
disorders such as thrombocytopenia! and generally for use in 

20 place of or complimentary to platelet transf usionsi and/or in 
supporting the growth and proliferation of hematopoietic stem 
cells which are capable of maturing to any and all of the above- 
mentioned hematopoietic cells and therefore find therapeutic 
utility in various stem cell disorders (such as those usually 

25 treated with transplantation-! including! without limitation-i 

aplastic anemia and paroxysmal nocturnal hemoglobinuria)-! as well 
as in repopulating the stem cell compartment post 
irradiation/chemotherapy-i either in-vivo or ex-vivo (i-e.T in 
conjunction with bone marrow transplantation or with peripheral 

30 progenitor cell transplantation (homologous or heterologous)) as 
normal cells or genetically manipulated for gene therapy. 

The activity of a protein of the invention may-» among other 
means-i be measured by the following methods: 

Suitable assays for proliferation and differentiation of 

35 various hematopoietic lines are cited above. 

Assays for embryonic stem cell differentiation (which will 
identify-i among othersn proteins that influence embryonic 
differentiation hematopoiesis) include-i without limitation! those 
described in: Johansson et al- Cellular Biology lS:im-lSl-! fflSi 
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Keller et al.-i Molecular and Cellular Biology 13 : 1473-iiab! m3; 
NcClanahan et al.! Blood 61:2^03-2^15! 1^3. 

Assays for stem cell survival and differentiation (which 
will identify! among others! proteins that regulate lympho- 
5 hematopoiesis) include! without limitation! those described in: 
flethylcellulose colony forming assays! Freshney! II- G- In Culture 
of Hematopoietic Cells- R- I- Freshney! et al- eds- Vol pp. 2b5- 
atifli Uiley-Lissi Inc-! New York-i N - Y- ll^i Hirayama et al-i 
Proc Natl. Acad- Sci- USA AT : 5^07-5^11-, ITOi Primitive 

10 hematopoietic colony forming cells with high proliferative 
potential! McNiece! I- K- and Briddelli R • A. In Culture of 
Hematopoietic Cells. R. I- Freshney! et al- eds- Vol pp- 23-3^! 
Uiley-Liss! Inc.! New York! N-Y. ITT 1 *! Neben et al-! Experimental 
Hematology 22:353-35^! mm Cobblestone area forming cell assays 

15 Ploemacher! R. E- In Culture of Hematopoietic Cells- R. I- 

Freshney! et al- eds- Vol pp. l-2l! Uiley-Liss! Inc-! New York! 
N-Y- lTTH! Long term bone marrow cultures in the presence of 
stromal cells! Spooncer! E-! Dexter! II- and Allen! T. In Culture 
of Hematopoietic Cells. R. I. Freshney! et al- eds- Vol pp- 1L3- 

20 17^! Uiley-Liss! Inc.! New York! N-Y. mm Long term culture 
initiating cell assayi Sutherlandn H- J- In Culture of 
Hematopoietic Cells- R- I- Freshney! et al- eds. Vol pp. 13T-lh2! 
Uiley-Liss! Inc-! New York! N-Y- mM- 

Tissue Growth Activity 

25 A protein of the present invention also may have utility in 

compositions used for bone! cartilage! tendon! ligament and/or 
nerve tissue growth or regeneration! as well as for wound healing 
and tissue repair and replacement! and in the treatment of burns! 
incisions and ulcers- 

30 A protein of the present invention! which induces cartilage 

and/or bone growth in circumstances where bone is not normally 
formed! has application in the healing of bone fractures and 
cartilage damage or defects in humans and other animals- Such a 
preparation employing a protein of the invention may have 

35 prophylactic use in closed as well as open fracture reduction and 
also in the improved fixation of artificial joints. De novo bone 
formation induced by an osteogenic agent contributes to the 
repair of congenital! trauma induced! or oncologic resection 
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induced craniofacial defects! and also is useful in cosmetic 
plastic surgery. 

A protein of this invention may also be used in the 
treatment of periodontal disease! and in other tooth repair 
5 processes. Such agents may provide an environment to attract 
bone-forming cells-i stimulate growth of bone-forming cells or 
induce differentiation of progenitors of bone-forming cells- A 
protein of the invention may also be useful in the treatment of 
osteoporosis or osteoarthr itis-i such as through stimulation of 

10 bone and/or cartilage repair or by blocking inflammation or 

processes of tissue destruction (collagenase activity! osteoclast 
activity! etc*) mediated by inflammatory processes- 

Another category of tissue regeneration activity that may be 
attributable to the protein of the present invention is 

15 tendon/ligament formation. A protein of the present invention! 
which induces tendon/ligament-like tissue or other tissue 
formation in circumstances where such tissue is not normally 
formed-i has application in the healing of tendon or ligament 
tears! deformities and other tendon or ligament defects in humans 

20 and other animals- Such a preparation employing a 

tendon/ligament-like tissue inducing protein may have 
prophylactic use in preventing damage to tendon or ligament 
tissue! as well as use in the improved fixation of tendon or 
ligament to bone or other tissues! and in repairing defects to 

25 tendon or ligament tissue. De novo tendon/ligament-like tissue 
formation induced by a composition of the present invention 
contributes to the repair of congenital trauma induced! or other 
tendon or ligament defects of other origin! and is also useful in 
cosmetic plastic surgery for attachment or repair of tendons or 

30 ligaments- The compositions of the present invention may provide 
environment to attract tendon- or ligament-forming cells! 
stimulate growth of tendon- or ligament-forming cells! induce 
differentiation of progenitors of tendon- or ligament-forming 
cells! or induce growth of tendon/ligament cells or progenitors 

35 ex vivo for return in vivo to effect tissue repair- The 

compositions of the invention may also be useful in the treatment 
of tendonitis! carpal tunnel syndrome and other tendon or 
ligament defects. The compositions may also include an 
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appropriate matrix and/or sequestering agent as a carrier as is 
well known in the art- 

The protein of the present invention may also be useful for 
proliferation of neural cells and for regeneration of nerve and 
5 brain tissue! i-e- for the treatment of central and peripheral 
nervous system diseases and neuropathies! as well as mechanical 
and traumatic disorders! which involve degeneration! death or 
trauma to neural cells or nerve tissue. More specifically! a 
protein may be used in the treatment of diseases of the 

10 peripheral nervous systemi such as peripheral nerve injuries! 
peripheral neuropathy and localized neuropathies-i and central 
nervous system diseases! such as Alzheimer ' s-i Parkinson's 
disease-i Huntington's disease! amyotrophic lateral sclerosisi and 
Shy-Drager syndrome- Further conditions which may be treated in 

15 accordance with the present invention include mechanical and 

traumatic disorders! such as spinal cord disorders! head trauma 
and cerebrovascular diseases such as stroke- Peripheral 
neuropathies resulting from chemotherapy or other medical 
therapies may also be treatable using a protein of the invention- 

20 Proteins of the invention may also be useful to promote 

better or faster closure of non-healing wounds-i including without 
limitation pressure ulcersi ulcers associated with vascular 
insufficiency i surgical and traumatic woundsi and the like- 

It is expected that a protein of the present invention may 

25 also exhibit activity for generation or regeneration of other 

tissues! such as organs (including! for example! pancreas^ liver! 
intestine! kidney! skin! endothelium)! muscle (smooth! skeletal 
or cardiac) and vascular (including vascular endothelium) tissue! 
or for promoting the growth of cells comprising such tissues- 

30 Part of the desired effects may be by inhibition or modulation of 
fibrotic scarring to allow normal tissue to regenerate- A protein 
of the invention may also exhibit angiogenic activity. 

A protein of the present invention may also be useful for 
gut protection or regeneration and treatment of lung or liver 

35 fibrosis! reperfusion injury in various tissues! and conditions 
resulting from systemic cytokine damage- 

A protein of the present invention may also be useful for 
promoting or inhibiting differentiation of tissues described 
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above from precursor tissues or cellsn or for inhibiting the 
growth of tissues described above- 

The activity of a protein of the invention may-, among other 
means-, be measured by the following methods: 
5 Assays for tissue generation activity include-i without 

limitation-, those described in: International Patent Publication 
No* UOTS/ltDBS (bone-, cartilagei tendon) i International Patent 
Publication No. UOTS/DSfiMb (nerve-, neuronals International 
. Patent Publication No- liKm/O?^! (skin-, endothelium). 
10 Assays for wound healing activity include-, without 

limitation-i those described in: Winter-, Epidermal Wound Healing-, 
pps- 71-112 (Maibach-, H. I. and Rovee-i D. T--» eds-)-. Year Book 
Medical Publishers-, Inc.-, Chicago-, as modified by Eaglstein and 
Hertz*. J. Invest. Dermatol 71:3fl2-fiM (1^76). 

15 Activin/Inhibin Activity 

A protein of the present invention may also exhibit activin- 
or inhibin-related activities. Inhibins are characterized by 
their ability to inhibit the release of follicle stimulating 
hormone (FSH)-. while activins and are characterized by their 

20 ability to stimulate the release of follicle stimulating hormone 
(FSH). Thus-, a protein of the present invention-, alone or in 
heterodimers with a member of the inhibin alpha family-, may be 
useful as a contraceptive based on the ability of inhibins to 
decrease fertility in female mammals and decrease spermatogenesis 

25 in male mammals- Administration of sufficient amounts of other 
inhibins can induce infertility in these mammals- Alternatively-, 
the protein of the invention-, as a homodimer or as a heterodimer 
with other protein subunits of the inhibin- beta groupi may be 
useful as a fertility inducing therapeutic-, based upon the 

30 ability of activin molecules in stimulating FSH release from 

cells of the anterior pituitary. See-i for example-, U-S- Pat- No. 
4i7 E }fl-»afl5. A protein of the invention may also be useful for 
advancement of the onset of fertility in sexually immature 
mammals-, so as to increase the lifetime reproductive performance 

35 of domestic animals such as cows-, sheep and pigs- 

The activity of a protein of the invention may-i among other 
means-, be measured by the following methods: 
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Assays for activin/inhibin activity include-, without 
limitations those described in: Vale et al-i Endocrinology 
^IsSbE-STEi 1175^ Ling et al-i Nature 321 : 7? c I-7fiE-. nfib=, Vale et 
al--i Nature 3Bl:77b-?7 I 1-. Mfltn Mason et al-i Nature 316:^5^-^3-. 
5 nflSn Forage et al-i Proc- Natl- Acad. Sci- USA fiS^OTl-SDTS-. 

nab- 

Chemotactic/Chemo kinetic Activity 

A protein of the present invention may have chemotactic or - 
chemokinetic activity (e-g.-. act as a chemokine) for mammalian 

10 cells-, including-, for example-, monocytes-, f ibroblasts-i 

neutrophils! T-cells-. mast cells-, eosinophils*, epithelial and/or 
endothelial cells- Chemotactic and chemokinetic proteins can be 
used to mobilize or attract a desired cell population to a 
desired site of action. Chemotactic or chemokinetic proteins 

15 provide particular advantages in treatment of wounds and other 
trauma to tissues-, as well as in treatment of localized 
infections. For example-, attraction of lymphocytes-i monocytes or 
neutrophils to tumors or sites of infection may result in 
improved immune responses against the tumor or infecting agent- 

20 A protein or peptide has chemotactic activity for a 

particular cell population if it can stimulate-, directly or 
indirectly-, the directed orientation or movement of such cell 
population. Preferably-, the protein or peptide has the ability to 
directly stimulate directed movement of cells. Whether a 

25 particular protein has chemotactic activity for a population of 
cells can be readily determined by employing such protein or 
peptide in any known assay for cell chemotaxis- 

The activity of a protein of the invention may-, among other 
means-, be measured by the following methods: 

30 Assays for chemotactic activity (which will identify 

proteins that induce or prevent chemotaxis) consist of assays 
that measure the ability of a protein to induce the migration of 
cells across a membrane as well as the ability of a protein to 
induce the adhesion of one cell population to another cell 

35 population. Suitable assays for movement and adhesion include-, 
without limitation-i those described in: Current Protocols in 
Immunology-. Ed by J. E- Coligan-. A - M . Kruisbeek-. D. H. 
Marguiles-. E. M- Shevach-. U. Strober-. Pub. Greene Publishing 
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Associates and Uiley-Interscience (Chapter b-lSn ileasurement of 
alpha and beta Chemokines b -12-1-b -12 • 2fln Taub et al. J. Clin- 
Invest- =15:1370-1371,-. ISlSi Lind et al. APHIS 103 : 14Q-14fc, -. l^Si 
duller et al Eur- J- Immunol- 25:1744-174fii Gruber et al- J. of 
5 Immunol. 152 : SfibQ-Sflb?-. m4; Johnston et al. J- of Immunol. 
153:17bS-17bfln 

Hemostatic and Thrombolytic Activity 

A protein of the invention may also exhibit hemostatic or 
thrombolytic activity. As a result-i such a protein is expected to 

10 be useful in treatment of various coagulation disorders 

(including hereditary disorders-i such as hemophilias) or to 
enhance coagulation and other hemostatic events in treating 
wounds resulting from traumai surgery or other causes. A protein 
of the invention may also be useful for dissolving or . inhibiting 

15 formation of thromboses and for treatment and prevention of 

conditions resulting therefrom (such as-i for example-i infarction 
of cardiac and central nervous system vessels (e.g-i stroke). 

The activity of a protein of the invention may-i among other 
means-, be measured by the following methods: 

20 Assay for hemostatic and thrombolytic activity include-i 

without limitation-i those described in: Linet et al.-i J - Clin- 
Pharmacol. 2b:131-140i Man Burdick et al-i Thrombosis Res- 
45:413-41^1 nfl7^ Humphrey et al --. Fibrinolysis SUl-Tl 
Schaubn Prostaglandins 35:4b7-474n lTflfl. 

25 Receotor/Liqand Activity 

A protein of the present invention may also demonstrate 
activity as receptors-i receptor ligands or inhibitors or agohists 
of receptor/ligand interactions- Examples of such receptors and 
ligands include^ without limitation! cytokine receptors and their 

30 ligandsn receptor kinases and their ligands-i receptor 

phosphatases and their ligands-i receptors involved in cell-cell 
interactions and their ligands (including without limitation-i 
cellular adhesion molecules (such as selectins-i integrins and 
their ligands) and receptor/ligand pairs involved in antigen 

35 presentations antigen recognition and development of cellular and 
humoral immune responses)- Receptors and ligands are also useful 
for screening of potential peptide or small molecule inhibitors 

-546- 



WO 01/98454 PCT/IB01/02050 
of the relevant receptor/ligand interaction- A protein of the 
present invention (including-i without limitation! fragments of 
receptors and ligands) may themselves be useful as inhibitors of 
receptor/ligand interactions - 
5 The activity of a protein of the invention mayi among other 

meansi be measured by the following methods: 

Suitable assays for receptor-ligand activity include without 
limitation those described insCurrent Protocols in Immunology-i Ed 
by J. E- Coligann A- H- Kruisbeeki D- H- flargulies-i E- M- 

10 Shevach-i U- Strober-» Pub- Greene Publishing Associates and Uiley- 
Interscience (Chapter 7-Hfin Measurement of Cellular Adhesion 
under static conditions 7-5fl-l-7.Hfl.2H) i Takai et-al-i Proc 
Natl. Acad- Sci- USA AM : bflb4-bflbfl-i Mfl7n Bierer et al-i J- Exp- 
Ned. Ibfl:ll45-115tn Mflfl! Rosenstein et al-i J • Exp- fled- 

15 Ib1:im-lb0 MM* Stoltenborg et al--» J- Immunol- Methods 175:51- 
bfl-, M1M; Stitt et al.-. Cell fl0:bbl-b70-, MIS- 

Ant i -Inflammatory Activity 

Proteins of the present invention may also exhibit anti- 
inflammatory activity. The anti-inflammatory activity may be 

20 achieved by providing a stimulus to cells involved in the 
inflammatory response-* by inhibiting or promoting cell-cell 
interactions (such asi for example! cell adhesion)-! by inhibiting 
or promoting chemotaxis of cells involved in the inflammatory 
processi inhibiting or promoting cell extravasation! or by 

25 stimulating or suppressing production of other factors which more 
directly inhibit or promote an inflammatory response. Proteins 
exhibiting such activities can be used to treat inflammatory 
conditions including chronic or acute conditions) n including 
without limitation intimation associated with infection (such as 

30 septic shock! sepsis or systemic inflammatory response syndrome 
(SIRS))! ischemia-reperf usion injury! endotoxin lethality! 
arthritis! complement-mediated hyperacute rejection-! nephritis! 
cytokine or chemokine-induced lung injury! inflammatory bowel 
disease! Crohn's disease or resulting from over production of 

35 cytokines such as TNF or IL-1- Proteins of the invention may also 
be useful to treat anaphylaxis and hypersensitivity to an 
antigenic substance or material- 
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Tumor Inhibi tion Activity 

In addition to the activities described above for 
immunological treatment or prevention of tumors-j a protein of the 
invention may exhibit other anti-tumor activities- A protein may 
5 inhibit tumor growth directly or indirectly (such asi for 

example-i via ADCO- A protein may exhibit its tumor inhibitory 
activity by acting on tumor tissue or tumor precursor tissue! by 
inhibiting formation of tissues necessary to support tumor growth 
(such as-i for example-i by inhibiting angiogenesis) by causing 
10 production of other factors! agents or cell types which inhibit 
tumor growth! or by suppressing! eliminating or inhibiting 
factors! agents or cell types which promote tumor growth. 

Other Activities 

A protein of the invention may also exhibit one or more of 

15 the following additional activities or effects: inhibiting the 
growth! infection or function oft or killing! infectious agents! 
including! without limitation! bacteria! viruses-, fungi and other 
parasites} effecting (suppressing or enhancing) bodily 
characteristics! including! without limitation! height! weighti 

20 hair color! eye color! skin! fat to lean ratio or other tissue 
pigmentation! or organ or body part size or shape (such as! for 
example! breast augmentation or diminution! change in bone form 
or shape)} effecting biorhythms or caricadic cycles or rhythms} 
effecting the fertility of male or female subjects! effecting the 

25 metabolism! catabolism! anabolism! processing! utilization! 
storage or elimination of dietary fat! lipid! protein! 
carbohydrate! vitamins! minerals! cofactors or other nutritional 
factors or component (s) } effecting behavioral characteristics! 
including! without limitation! appetite! libido! stress! 

30 cognition (including cognitive disorders)! depression (including 
depressive disorders) and violent behaviors! providing analgesic 
effects or other pain reducing effects! promoting differentiation 
and growth of embryonic stem cells in lineages other than 
hematopoietic lineages! hormonal or endocrine activity} in the 

35 case of enzymes! correcting deficiencies of the enzyme and 
treating deficiency-related diseases} treatment of 
hyperprolif erative disorders (such asi for example! psoriasis)} 
immunoglobulin-like activity (such asi for example! the ability 
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to bind antigens or complement)^ and the ability to act as an 
antigen in a vaccine composition to raise an immune response 
against such protein or another material or entity which is 
cross-reactive with such protein- 

5 Particular Applications for Certain Clones 

The following sets out a non-exclusive list of applications 
for certain embodiments of the invention- In the interest of 
economy-i applications relevant to multiple embodiments are not 
duplicated in this list- Other embodiments described herein have 
10 similar characteristics-i as described there- The artisan is 
directed! therefore! to the Description of the Sequences for 
similar descriptions of the functions of other embodiment- 



Testes 



15 



htes3_lDilb : The new protein can find application in 
diagnosis/therapy in leukemia predisposition/disease in the 
modulation of DNA repair- 



20 



htes3_10nlQ: The new protein can find application in 
studying the expression profile of testis-specif ic genes- 



htes3_llal? : The new protein can find application in 
studying the expression profile of testis-specif ic genes and 
as a new marker for testicular cells- 



25 



htes3_llc52: The new protein can find application in 
modulating/blocking of regulatory pathways- 



30 



htes3_lldEl : The new protein can find application in 
diagnosis of diseases due to unnormal protein degradation 
like muscular dystrophy or multiple sclerosis as well as in 
modulating the half life of specific proteins and in 
expression profiling- 



35 Kidney 



hfkd5_3kl The new protein can find application in modulation 
of endocytosis- strong similarity to testicular dynamin 
(Rattus norvegicus). 
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Amygdala: 

hamy5_lQhl7: The new protein can find application in 
5 modulating protein-protein-interaction and in studying the 

expression profile of amygdala-specific genes- 

hamy2_10p7: The new protein can find application in 
modulation of NA+/Ca2+-exchange and voltage-dependend 
10 processes* 

hamy5_lld2: The new protein can find application in studying 
the expression profile of amygdala-specific genes and as a 
new marker for amygdala cells- 

15 

hamy2_llni4: The new protein can find application in 
modulation of DNA-repair and a as a new tool for 
manipulation of nucleic acids- 

20 hamy2_ieifn: The new protein can find application 

modulation of cyto skeleton-membrane interactions. 

Fetal Brain; 

hfbr2_7flcl2: The new protein can find application in the 
25 modulation of translational pathways. 

hfbr2_7fldlfl: The new protein can find application in 
studying the expression profile of brain-specific genes- 



30 



35 



40 



hfbr2_76d4: The new protein can find application in studying 
the expression profile of brain-specific genes and as a new 
marker for amygdala cells - 

hfbrE_7fiel8: The new protein can find application in 
studying the expression profile of brain-specific genes- 

hfbrE_7fii21: The new protein can find application in 
diagnosis/modulation of protein damage and age-related 
degenerative processes- 



Melanoma: 



-550- 



WO 01/98454 PCT7IB01/02050 

hmel2_12jl: The new protein can find application in studying 
the expression profile of melanoma-specific genes- 



hmelE_7gm: The new protein can find application in 
5 modulation of the sorting of proteins into different 

compartments- 

hmelE.TklT: The new protein can find application in studying 
the expression profile of melanoma-specific genes- 

10 

VARIANTS OF THE INVENTIVE DNA MOLECULES 

Variants in General 

"Variants-i n according to the invention-! include DNA and/or 

15 protein molecules that resemble^ structurally and/or functionally i 
those set forth herein- Variants may be isolated from natural 
sources ("homologs") -i may be entirely synthetic or may be based in 
part on both natural and synthetic approaches. 

The section set forth below presents various structural and 

20 functional characteristics of molecules within the invention- 
Preferred molecules are characterized by a combination of one or 
more of these characteristics- For instance-* some preferred 
molecules are described with reference to at least two structural 
characteristics-i while others may be described with reference to 

25 at least one structural and at least one functional 
characteristic. 

It will be recognized by the skilled artisan that structure 
ultimately defines function-i i.e. the functions of the molecules 
described herein derives from the structures of those molecules- 

30 Accordingly-, the structural variants described below that bear the 
closest structural relationship (as variously defined below) to 
the inventive molecules are the variants that most likely will 
preserve biological function- This relationship between structure 
and function will guide the skilled artisan in identifying the 

35 preferred embodiments of the invention. 
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Splicing Variants 

It is well-known that eukaryotic structural genes are 
comprised of both protein coding and non-coding portions- When 
the messenger RNA is transcribed from the DNA template-* it 
5 contains introns-i which are non-coding! and exons-i which are 
coding. In order to form a translation competent mRNAn the 
introns must be "spliced" out of this initial pre mRNA • 

Specific sequences within the pre mRNA represent "splice 
junctions" that direct the cellular splicing machinery to the 

10 appropriate position- The splice junctions are loosely conserved 
sequence regions of the pre mRNAi which almost invariably begin 
with GT and end with AG (DNA perspective). The S r end of the 
splice junction typically contains about nine somewhat conserved 
residues! for example-. C/AAGTA/GAGT - The 3 r end usually contains 

15 a pyrimidine rich stretch of at least about 11 nucleotides-! 
followed by NC/TAGG- Splicing occurs before the GT and after the 
AG- Mount! Nucleic Acids Res. 10 ^51-72 (nfi2>. 

Interestinglyn exons often correspond to discrete functional 
domains of the protein product- The intron/exon arrangement thus 

20 creates a linear array of nucleotides which can be correlated to 
discrete! and often interchangeable-! functional protein fragments- 
Go-i Nature 211:^0-^2 (lISDi Branden et al.i EMBO J. 3:1307-10 
(ITS 1 !). This linear arrangement creates the possibility of 
generating multiple different full length proteins by rearranging 

25 the order of the different functional portions in the array. For 
example^ if a set of exons are arranged l-2-3-4i where (-) 
represents the introns separating the exons-i a splicing event need 
not simply produce 1234! but may produce 1E3! 134! 124 and so on- 
Production of different mRNA products in this way is commonly 

30 called "alternative splicing-" Andreadis et al.! Ann. Rev. Cell 
Biol. 3:207-42 (nfl7). 

Some of the present DNA molecules can be represented in 
modular fashion in terms of their coding regions. Essentially-, 
these modules are exons (though each "exon" may in fact be made up 

35 of several exons) ! which may be combined in different ways to form 
a variety of different DNA molecules! each encoding a different 
functional protein. Splicing variants are indicated in the 
Description of the Sequences- 
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Degenerate Variants 

One aspect of the present invention provides "degenerate 
variants" of the nucleic acid fragments of the present invention- 
A "degenerate variant" is a nucleotide fragment which differs from 
5 those of inventive molecules by nucleotide sequence! but due to 
the degeneracy of the genetic code! encodes an identical 
polypeptide sequence- 

Given the known relationship between DNA sequences and the 
proteins they encode! degenerate variants typically are described 

10 by reference to this relationship- It is well known that the 
degeneracy of the genetic code results in many possible DNA 
sequences which encode a particular protein- Indeed! of the three 
bases which comprise an amino acid-encoding triplet! the third 
position! and often the second! almost always may vary- This fact 

15 alone allows for a class of variant DNA molecules which encode 
protein sequences identical to those disclosed herein-* yet have 
about 30% sequence variation- In other words! the variant DNA 
molecules are about 70* identical to the inventive DNAst having no 
additional or deleted sequences- Thus-i one aspect of the 

20 invention provides degenerate variant DNA molecules encoding the 
inventive protein sequences- 

In one embodiment! these variants have at least about 70* 
sequence identity with the DNA molecules described herein- In a 
preferred embodiment! these variants have at least about &QX 

25 sequence identity to the inventive molecules- In a more preferred 
embodiment these variants have at least about 1QZ sequence 
identity with the inventive molecules- 

Conservative Amino Acid Variants 

Variants according to the invention also may be made that 

30 conserve the overall molecular structure of the encoded proteins- 
Given the properties of the individual amino acids comprising the 
disclosed protein products! some rational substitutions will be 
recognized by the skilled worker. Amino acid substitutions! i.e. 
"conservative substitutions!" may be made! for instance! on the 

35 basis of similarity in polarity! charge! solubility-* 
hydrophobicity! hydrophilicity! and/or the amphipathic nature of 
the residues involved- 
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For example: (a) nonpolar (hydrophobic) amino acids include 
alaninei leucines isoleucine-i valine! proline! phenylalanine! 
tryptophan! and methionine! (b) polar neutral amino acids include 
glycine! serine! threonine! cysteine! tyrosine! asparagine! and 
5 glutarnine! (c) positively charged (basic) amino acids include 
arginine! lysine! and histidine! and (d) negatively charged 
(acidic) amino acids include aspartic acid and glutamic acid- 
Substitutions typically may be made within groups (a)-(d)- In 
addition! glycine and proline may be substituted for one another 

10 based on their ability to disrupt cc-helices. Similarly! certain 
amino acids! such as alanine! cysteine! leucine! methionine! 
glutamic acid! glutarnine! histidine and lysine are more commonly 
found in a-helicesi while valine! isoleucine! phenylalanine! 
tyrosine! tryptophan and threonine are more commonly found in p~ 

15 pleated sheets- Glycine! serine! aspartic acid! asparagine! and 
proline are commonly found in turns- Some preferred substitutions 
may be made among the following groups: (i) S and TS (ii) P and Gi 
and (iii) V! L and I- Given the known genetic code! and 

recombinant and synthetic DNA techniques! the skilled scientist 

20 readily can construct DNAs encoding the conservative amino acid 
variants- 

As used herein! "sequence identity" between two polypeptide 
sequences indicates the percentage of amino acids that are 
identical between the sequences- "Sequence similarity" indicates 
25 the percentage of amino acids that either are identical or that 
represent conservative amino acid substitutions. 

Functionally Equivalent Variants 

Yet another class of DNA variants within the scope of the 

invention may be described with reference to the product they 
30 encode- As shown in the Description of the Sequences! some of the 

inventive DNA molecules encode a protein having a degree of 

homology with known proteins! or protein domains- It is expected! 

therefore-, that they will have some or all of the requisite 

functional features of such molecules- These "functionally 
35 equivalent variants 11 products are characterized by the fact that 

they are functionally equivalent! with respect to biological 

activity! to certain known molecules- 
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Also provided herein is information on common structural 
motifs-i including consensus sequences that will guide the artisan 
in constructing functionally equivalent variants- It will be 
understood that the motifs-, identified in the Description of the 
5 Sequences for each inventive protein! may be modified within the 
identified consensus sequences- Thusn the invention contemplates 
the proteins in the Description of the Sequences that contain 
variability in the consensus sequences identified! and the 
invention further contemplates the full range of nucleic acids 
10 encoding them! and the complements of those nucleic acids- 

Hybridizing Variants 

DNA variants within the invention also may be described by 
reference to their physical properties in hybridization. One 
skilled in the field will recognize that DNA can be used to 

15 identify its complement andi since DNA is double stranded-, its 
equivalent or homologn using nucleic acid hybridization 
techniques. It will also be recognized that hybridization can 
occur with less than IDDfc complementarity. However! given 
appropriate choice of conditions! hybridization techniques can be 

20 used to differentiate among DNA sequences based on their 
structural relatedness to a particular probe. For guidance 
regarding such conditions seen for example! Sambrook et al.i ITflT-i 
MOLECULAR CLONING-. A LABORATORY MANUAL-, Cold Spring Harbor Press-, 
N- Y. n and Ausubel et al.-i nflT-. CURRENT PROTOCOLS IN MOLECULAR 

25 BIOLOGY! Green Publishing Associates and Wiley Interscience-i N-Y. 

Structural relatedness between two polynucleotide sequences 
can be expressed as a function of "stringency" of the conditions 
under which the two sequences will hybridize with one another. As 
used hereini the term "stringency" refers to the extent that the 

30 conditions disfavor hybridization. Stringent conditions strongly 
disfavor hybridization! and only the most structurally related 
molecules will hybridize to one another under such conditions. 
Conversely! non-stringent conditions favor hybridization of 
molecules displaying a lesser degree of structural relatedness. 

35 Hybridization stringency-i therefore! directly correlates with the 
structural relationships of two nucleic acid sequences. The 
following relationships are useful in correlating hybridization 
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and relatedness (where T ro is the melting temperature of a nucleic 
acid duplex): 



a- T ra = bl-3 + O-mCG+OX 

5 

b. The T n of a duplex DNA decreases by 1°C with every 
increase of IX in the number of mismatched base pairs- 

c- (T m ) M 2 - CTm)^ = lfl-5 logio|u5/|Lil 
10 where |il and \xE are the ionic strengths of two 

solutions. 

Hybridization stringency is a function of many factors-i 
including overall DNA concentration! ionic strengths temperaturei 
15 probe size and the presence of agents which disrupt hydrogen 
bonding. Factors promoting hybridization include high DNA 
concentrations! high ionic strengths-! low temperatures! longer 
probe size and the absence of agents that disrupt hydrogen 
bonding- 

20 Hybridization usually is done in two stages. First! in the 

"binding" stage! the probe is bound to the target under conditions 
favoring hybridization. Stringency is usually controlled at this 
stage by altering the temperature- For high stringency-! the 
temperature is usually between bS°C and 7D°Ct unless short (<ED 

25 nt) oligonucleotide probes are used- A representative 

hybridization solution comprises bX SSCt SDSi SX Denhardt r s 

solution and lD0|ag of non-specific carrier DNA. See Ausubel et 
al.-i supra-! section E.T-i supplement 2? Cn^M). Of course many 
different! yet functionally equivalent! buffer conditions are 

30 known- Where the degree of relatedness is lower! a lower 
temperature may be chosen. Low stringency binding temperatures 
are between about 25°C and MD°C- Medium stringency is between at 
least about MQ°C to less than about b5°C- High stringency is at 
least about bS°C- 

35 Second! the excess probe is removed by washing. It is at 

this stage that more stringent conditions usually are applied. 
Hence! it is this "washing" stage that is most important in 
determining relatedness via hybridization- Washing solutions 
typically contain lower salt concentrations- One exemplary medium 

40 stringency solution contains EX SSC and 0-1X SDS- A high 
stringency wash solution contains the equivalent (in ionic 
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strength) of less than about 0-2X SSCi with a preferred stringent 
solution containing about 0.1X SSO The temperatures associated 
with various stringencies are the same as discussed above for 
"binding." The washing solution also typically is replaced a 
5 number of times during washing- For example! typical high 
stringency washing conditions comprise washing twice for 30 
minutes at 55° C. and three times for IS minutes at b0° C- 

The present invention includes nucleic acid molecules that 
hybridize to the inventive molecules under high stringency binding 
10 and washing conditions- More preferred molecules (from an mRNA 
perspective) are those that are at least SO * of the length of any 
one of those depicted in the Description of the Sequences. 
Particularly preferred molecules are at least 75 X of the length 
of those molecules. 

15 Substitutions, Insertions, Additions and Deletions 

In a general sense! the preferred DNA variants of the 
invention are those that retain the closest relationship-! as 
described by "sequence identity" to the inventive DNA molecules. 
According to another aspect of the invention! therefore! 

20 substitutions! insertions! additions and deletions of defined 
properties are contemplated. It will be recognized that sequence 
identity between two polynucleotide sequences! as defined herein-* 
generally is determined with reference to the protein coding 
region of the sequences- Thus-i this definition does not at all 

25 limit the amount of DNAt such as vector DNAt that may be attached 
to the molecules described herein- Preferred DNA sequence 
variants include molecules encoding proteins sharing some or all 
of any relevant biological activity of the native molecule- 

In creating these variants! the skilled worker will be guided 

30 by reference to the protein structure. First! insertions and 
deletions in any recognized functional domain above generally 
should be avoided! except as noted below in the section entitled 
"Proteins!" where this domain is discussed in detail. Alterations 
in such domains usually will be limited to conservative amino acid 

35 substitutions. In addition! where insertions and deletions are 
desired! this may be accomplished at the N- and/or C-terminus of 
the protein molecule (or the corresponding coding regions of the 
DNA). If insertions or deletions are made within the protein! 
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deletions of major structural features usually should be avoided- 
Thus-i a preferred place to make insertion or deletion variants is 
in non-structural regions! such as linker regions between two 
alpha helices- 

5 "Substitutions" generally refer to alterations in the DNA 

sequence which do not change its overall length! but only alter 
one or more nucleotide positions! substituting one for another in 
the common sense of the word- One class of preferred 
substitutions! "degenerate substitutions-!" are those that do not 

10 alter the encoded amino acid sequence. Some subsitutions retains 
SO*! 55*! b0* or bS* identity- Preferred substitutions retain at 
least about 70* identity! more preferably at least 70* or 75* 
identity! with the inventive DNAs- Some more preferred molecules 
have at least about ADZ identity! more preferably at least SO* or 

15 AS* identity. Particularly preferred DNAs share at least about 
TO* identity! more preferably at least TD* or T5* identity. 

"Insertions!" unlike substitutions! alter the overall length 
of the DNA molecule! and thus sometimes the encoded protein- 
Insertions add extra nucleotides to the interior (not the 5' or 3 r 

20 ends) of the subject DNAs. Preferred insertions are made with 
reference to the protein sequence encoded by the DNA. Thus! it is 
most preferred to provide an insertion in the DNA at a location 
that corresponds to an area of the encoded protein which lacks 
structure. For instance! it typically would not be beneficial! if 

25 the preservation of biological activity is desired! to provide an 
insertion within an alpha-helical region or a beta-pleated sheet. 
Accordingly! non-structural ar^as^ such as those containing helix- 
breaking glycines and proline residues! are most preferred sites 
of insertion- Other preferred sites of insertion are the splice 

30 sites! which are indicated above in the description of the 
inventive DNA molecules- 

While the optimal size of insertions will vary depending upon 
the site of insertion and its effect on the overall conformation 
of the encoded protein! some general guides are useful- 

35 Generally! the total insertions (irrespective of their number) 
should not add more than about 3D* (or preferably not more than 
3D*) to the overall size of the encoded protein. More preferably! 
the insertion adds less than about 10-50* (yet more preferably 10- 
2D*) in size! with less than about 10* being most preferred. The 
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number of insortions is limited only by the number of suitable 
insertions sites-i and secondarily by the foregoing size 
preferences- 

"Additions*!" like insertions! also add to the overall size of 
5 the DNA molecule^ and usually the encoded protein- However-i 
instead of being made within the molecule-i they are made on the S r 
or 3 r endi usually corresponding to the N- or C- terminus of the 
encoded protein. Unlike deletions! additions are not very size- 
dependent- Indeedi additions may be of virtually any size- 
10 Preferred additions! however-i do not exceed about 100% of the size 
of the native molecule. More preferably-! they add less than about 
bO to 30*/. to the overall size-! with less than about 3UX being most 
preferred. 

"Deletions" diminish the overall size of the DNA and-i 
15 therefore-i also reduce the size of the protein encoded by that 
DNA- Deletions may be made from either end of the molecule or 
internal to it- Typical preferred deletions remove discrete 
structural features of the encoded protein- For example-* some 
deletions will comprise the deletion of one or more exons which 
20 may define a structural feature- Preferred deletions remove less 
than about 30* of the size of the subject molecule- More 
preferred deletions remove less than about BOX and most preferred 
deletions remove less than about 1QX- 

Computer-De fined Variants and Definition of "Sequence Identity" 
25 In general -i both the DNA and protein molecules of the 

invention can be defined with reference to "sequence identity-" 
As used herein-i "sequence identity" refers to a comparison made 
between two molecules using-! for example-i the standard Smith- 
Waterman algorithm that is well known in the art. 
30 Some molecules have at lease about SD/C-i 55/C or b0/C identity. 

Preferred molecules are those having at least about t>5X sequence 
identity-i more preferably at least bSX or 70Z sequence identity. 
Other preferred molecules have at least about flOJC-i more preferably 
at least &0X or flS/Cn sequence identity. Particularly preferred 
35 molecules have at least about TO* sequence identity-, more 
preferably at least TO* sequence identity. Most preferred 
molecules have at least about TS5C-! more preferably at least TSX-i 
sequence identity. As used herein-i two nucleic acid molecules or 
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proteins are said to "share significant sequence identity* if the 
two contain regions which possess greater than &SX sequence (amino 
acid or nucleic acid) identity- 

"Sequence identity" is defined herein with reference the 
5 Blast E algorithm-, which is available at the NCBI 

(http://www.ncbi-nlm.nih.gov/BLAST) n using default parameters. 
References pertaining to this algorithm include: those found at 
http : //www. ncbi .nlm-nih.gov/BLAST/blast_ref erences.html t 
Altschuli S-F.-i Gishn Ul - -i Miller-i U.-i Myers-i E-U- & Lipman^ D-J. 

10 (1^0) "Basic local alignment search tool." J. Hoi- Biol. 

ai5:1D3-mo; Gish-, Id. & Statesn D.J. (m3> "Identification of 
protein coding regions by database similarity search." Nature 
Genet. 3:Sbb-E7En Madden^ T-L-i Tatusovi R - L . & Zhang-i J. (mt,) 
"Applications of network BLAST server" fleth. Enzymol. Bbb:131- 

15 mil Altschuli S.F.-, Madden-. T-L-n SchSfferi A-A--. Zhang-. J • i 
Zhang-, Z--. Miller-. U- & Lipman-. D.J. (1117) "Gapped BLAST and 
PSI-BLAST: a new generation of protein database search programs- 11 
Nucleic Acids Res- E5 : 3361-3^^ and Zhangi J- & Madden-, T-L- 
(m?) "PowerBLAST: A new network BLAST application for 

20 interactive or automated sequence analysis and annotation." 
Genome Res- 7 : fc.4T-fc.5fci . 

METHODS OF MAKING VARIANTS 

It will be recognized that variants of the inventive 
molecules can be constructed in several different ways. For 

25 examplei they may be constructed as completely synthetic DNAs - 
Methods of efficiently synthesizing oligonucleotides in the range 
of ED to about ISO nucleotides are widely available- See Ausubel 
et al.i supra^ section E-lli Supplement El (1^3). Overlapping 
oligonucleotides may be synthesized and assembled in a fashion 

30 first reported by Khorana et al.i J- Mol. Biol- 7E:ED C 1-E17 (1171); 
see also Ausubel et all Section fl-E- The synthetic DNAs are 
designed with convenient restriction sites engineered at the S r 
and 3' ends of the gene to facilitate cloning into an appropriate 
vector. 

35 An alternative method of generating variants is to start with 

one of the inventive DNAs and then to conduct site-directed 
mutagenesis. See Ausubel et al.-i suprai chapter At Supplement 37 
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(IW)' In a typical method! a target DNA is cloned into a 
single-stranded DNA bacteriophage vehicle- Single-stranded DNA is 
isolated and hybridized with a oligonucleotide containing the 
desired nucleotide alteration(s) • The complementary strand is 
5 synthesized and the double stranded phage is introduced into a 
host- Some of the resulting progeny will contain the desired 
mutant! which can be confirmed using DNA sequencing- In addition! 
various methods are available that increase the probability that 
the progeny phage will be the desired mutant. These methods are 
10 well known to those in the field and kits are commercially 
available for generating such mutants- 

ISOLATING HOMOLOGS 
Methods 

By using the sequences disclosed herein as probes or .as 

15 primers-* and techniques such as PCR cloning and colony/plaque 
hybridization! one skilled in the art can obtain homologs- 
"Homologs" are essentially naturally-occurring variants and 
include allelic! species-specific and tissue-specific variants- 

Region-specific primers or probes derived from the nucleotide 

20 sequence(s) provided can be used to prime DNA synthesis and PCR 
amplification! as well as to identify colonies containing cloned 
DNA encoding a homolog using known methods (Innis et al., PCR 
Protocols, Academic Press-i San Diegoi CA (HID)). Such an 
application is useful in diagnostic methods! as described in more 

25 detail below! as well as in preparing full-length DNAs from 
various sources- The PCR primers are preferably at least IS 
bases! and more preferably at least Ifi bases in length- When 
selecting a primer sequence! it is preferred that the primer pairs 
have approximately the same G/C ratio! so that melting 

30 temperatures are approximately the same- As a general guide! the 
formula 3(G+C) + S ( A+T) - °C! is useful. 

When using primers derived from the inventive sequences! one 
skilled in the art will recognize that by employing high 
stringency conditions (e.g. i annealing at 5D-bO°C) ! only sequences 

35 with greater than 75* sequence identity to the primer will be 
amplified- By employing lower stringency conditions (e-g-! 
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annealing at 35-37°C)«i sequences which have greater than 140-50* 
sequence identity to the primer also will be amplified- 

The PCR product may be subcloned and sequenced to confirm 
that it indeed displays the expected sequence identity. The PCR 
5 fragment may then be used to isolate a full length cDNA clone by a 
variety of methods- For example! the amplified fragment may be 
labeled and used to screen a bacteriophage cDNA library. 
Alternatively i the labeled fragment may be used to screen a 
genomic library. 

10 PCR technology may also be utilized to isolate full length 

cDNA sequences- For example! RNA may be isolated! following 
standard procedures! from an appropriate cellular or tissue 
source- A reverse transcription reaction may be performed on the 
RNA using an oligonucleotide primer specific for the most 5' end 

15 of the amplified fragment for the priming of first strand 
synthesis. The resulting RNA/DNA hybrid may then be "tailed" with 
guanines using a standard terminal transferase reaction! the 
hybrid may be digested with RNAase hU and second strand synthesis 
may then be primed with a poly-C primer. Thus-i cDNA sequences 

20 upstream of the amplified fragment may easily be isolated. For a 
review of cloning strategies which may be used! see e-g.-i Sambrook 
et al.-i nfl^i supra. 

When using DNA probes derived from the inventive sequences 
for colony/plaque hybridization! one skilled in the art will 

25 recognize that by employing medium to high stringency conditions 
(e.g-i hybridizing at S0-bS°C in 5X SSPC and SOX formamide! and 
washing at 50-b5°C in CSX SSPC>! sequences having regions with 
greater than ^DX sequence identity to the probe can be obtained! 
and that by employing lower stringency conditions (e.g.! 

30 hybridizing at 35-37°C in SX SSPC and mi-^S'/. formamide! and 
washing at 42°C in SSPC>! sequences having regions with greater 
than 35^5'/. sequence identity to the probe will be obtained- 

Suitably! genomic or cDNA libraries can be constructed and 
screened in accord with the previous paragraph. The libraries 

35 should be derived from a tissue or organism that is known to 
express the gene of interest! or that is suspected of expressing 
the gene. The clone containing the homolog may then be purified 
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through methods routinely practiced in the art! and subjected to 
sequence analysis- 

Additionally! an expression library can be constructed 
utilizing DNA isolated from or cDNA synthesized from a tissue or 
5 organism that is known to express the gene of interests or that is 
suspected of expressing the gene* In this manneri clones may be 
induced and screened using standard antibody screening techniques 
in conjunction with antibodies raised against the normal gene 
product-i as described herein. (For screening techniques! seen for 
10 examplei Harlowi E- and Lane-i eds-i nflfl-i ANTIBODIES: A 
LABORATORY MANUAL! Cold Spring Harbor Press, Cold Spring Harbor 
Press- ) 

Human Homologs 

Any organism or tissue can be used as the source for homologs 
15 of the present invention so long as the organism or tissue 
naturally expresses such a protein or contains genes encoding the 
same- The most preferred organism for isolating homologs is 
human- 

PROTEINS OF THE INVENTION 

20 One class of proteins included within the invention is 

encoded by the inventive DNA molecules presented. Other proteins 
according to the invention are those encoded by the DNA variants 
described above- As noted! these variants are designed with the 
encoded proteins in mind- 

25 A preferred class of protein fragments includes those 

fragments which retain any biological activity. These molecules 
share functional features common the family of proteins! although 
these characteristics may vary in degree. 

According to one aspect of the invention fragments of the 

30 inventive proteins are contemplated- Some preferred fragments are 
those which are capable of eliciting an immune response. 
Generally these "antigenic" fragments will be from about five 
amino acids in length to about fifty amino acids in length. Some 
preferred antigenic fragments are from five to about twenty amino 

35 acids long. "Antigenic" response may refer to a T cell response! 
a B cell response or a response by cells of the 
macrophage/monocyte lineages. In most cases! however! it will 
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refer to the immune response involved in the generation of 
antibodies. In other words-i the relevant immune response is that 
of helper T cells and/or B cells. These preferred molecules 
comprise one or more T cell and /or B cell epitopes- 

5 ANTIBODIES OF THE INVENTION 

Antibodies raised against the proteins and protein fragments 
of the invention also are contemplated by the invention. 
Described below are antibody products and methods for producing 
antibodies capable of specifically recognizing one or more 
10 epitopes of the presently described proteins and their 
derivatives. 

Antibodies include-* but are not limited to polyclonal 
antibodies-i monoclonal antibodies (mAbs)-i humanized- or chimeric 
antibodies! single chain antibodies including single chain Fv 
15 (scFv) fragments! Fab fragments-i F(ab') 2 fragments-! fragments 
produced by a Fab expression library-i anti-idiotypic (anti-Id) 
antibodies-i epitope-binding fragments-i and humanized forms of any 
of the above- 

As known to one in the art-i these antibodies may be used-i for 

20 example-i in the detection of a target protein in a biological 
sample- They also may be utilized as part of treatment methods! 
and/or may be used as part of diagnostic techniques whereby 
patients may be tested for abnormal levels or for the presence of 
abnormal forms of the such proteins- 

25 In general-i techniques for preparing polyclonal and 

monoclonal antibodies as well as hybridomas capable of producing 
the desired antibody are well known in the art (Campbell-i A-M-! 
Monoclonal Antibody Technology: Laboratory Techniques in 
Biochemistry and Molecular Biology^ Elsevier Science Publishers-i 

30 Amsterdam-i The Netherlands (lTflM)! St. Groth et al-i J*. Immunol. 
Methods 35:1-E1 (nflO)} Kohler and Milstein-i Nature BSbxMTS-MI? 
(n?S))i the trioma technique-! the human B-cell hybridoma 
technique (Kozbor et al-i Immunology Today M^7E (1133)! Cole et 
al-i in Monoclonal Antibodies and Cancer Therapy^ Alan R. Liss-i 

35 Inc. (nfiS)i pp. Antibodies may also be generated by the 

known techniques of phage display and in vitro immunization. 
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Polyclonal Antibodies 

Polyclonal antibodies are heterogeneous populations of 
antibody molecules derived from the sera of animals immunized with 
an antigen-i such as an inventive protein or an antigenic 
5 derivative thereof - 

Polyclonal antiserums containing antibodies to heterogeneous 
epitopes of a single protein-i can be prepared by immunizing 
suitable animals with the expressed protein described above-, which 
can be unmodified or modified-! as known in the art-i to enhance 

10 immunogenicity • Immunization methods include subcutaneous or 
intraperitoneal injection of the polypeptide. 

Effective polyclonal antibody production is affected by many 
factors related both to the antigen and to the host species- For 
example-i small molecules tend to be less immunogenic than other 

15 and may require the use of carriers and/or adjuvant. In addition-! 
host animal response may vary with site of inoculation- Both 
inadequate or excessive doses of antigen may result in low titer 
antisera- In general-i howeveri small doses (high ng to low |Lig 
levels) of antigen administered at multiple intradermal sites 

20 appears to be most reliable- Host animals may include but are not 
limited to rabbits-i micen chickens and rats-i to name but a few. 
An effective immunization protocol for rabbits can be found in 
Vaitukaitisi J- et al., J. Clin. Endocrinol. Metab. 33:^8-^1 
(1^71). 

25 The protein immunogen may be modified or administered in an 

adjuvant in order to increase the protein's antigenicity- Methods 
of increasing the antigenicity of a protein are well known in the 
art and include-! but are not limited to coupling the antigen with 
a heterologous protein (such as globulin J3-galactosidase) or 

30 through the inclusion of an adjuvant during immunization. 
Adjuvants include Freund's (complete and incomplete)! mineral gels 
such as aluminum hydroxide-! surface active substances such as 
lysolecithin-i pluronic polyols-i polyanions-i peptides! oil 
emulsions-! keyhole limpet hemocyanini dinitrophenol -i and 

35 potentially useful human adjuvants such as BCG (bacille Calmette- 
Guerin) and Corynebacterium parvum. 

Booster injections can be given at regular intervals-! with at 
least one usually being required for optimal antibody production- 
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The antiserum may be harvested when the antibody titer begins to 
fall- Titer may be determined semi-quant itatively -. for example-, 
by double immunodiffusion in agar against known concentrations of 
the antigen. See-i for example-. Ouchterlony et al., Chap. M in: 
5 Handbook of Experimental Immunology, Idier-. ed-. Blackwell (1=173). 
Plateau concentration of antibody is usually in the range of 0-1 
to 0.2 mg/ml of serum (about IE |aM>- The antiserum may be 
purified by affinity chromatography using the immobilized 
immunogen carried on a solid support. Such methods of affinity 

10 chromatography are well known in the art- 

Affinity of the antisera for the antigen may be determined by 
preparing competitive binding curves-, as described-, for example-, 
by Fisher -. Chap- MS in: Manual of Clinical Immunology, second 
edition^ Rose and Friedman-, eds-i Amer- Soc For Microbiology! 

15 Washington-. D.C (nflO). 

In addition to using protein an the immunogen-i DNA molecules 
may be used directly. In this manner-, a DNA encoding the protein 
immunogen is administered. Boosting and harvesting is done in a 
manner analogous to that detailed above. Yet another method of 

20 producing antibodies entails immunizing chickens and harvesting 
the antibodies from their eggs- 

Monoclonal Antibodies 

Monoclonal antibodies (MAbs)i are homogeneous populations of 
antibodies to a particular antigen- They, may be obtained by any 
25 technique which provides for the production of antibody molecules 
by continuous cell lines in culture or in vivo- MAbs may be 
produced by making hybridomas which are immortalized cells capable 
of secreting a specific monoclonal antibody. 
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Monoclonal antibodies to any of the proteins-, peptides and 
epitopes thereof described herein can be prepared from murine 
hybridomas according to the classical method of Kohler-i G. and 
Milstein, O, Nature 25b : M^5-H^7 (n?5) (and U.S. Patent No. 
5 M, 37b, 110) or modifications of the methods thereof-! such as the 
human B-cell hybridoma technique (Kosbor et al., nS3, Immunology 
Today H:72; Cole et al., 1^63, Proc. Natl. Acad. Sci. USA SO: 
202b-2030), and the EBV-hybridoma technique (Cole et al., 1TS5, 
MONOCLONAL ANTIBODIES AND CANCER THERAPY, Alan R- Liss, Inc.-, pp. 
10 77-Tb) • 

In one method a mouse is repetitively inoculated with a few 
micrograms of the selected protein over a period of a few weeks- 
The mouse is then sacrificed, and the antibody producing cells of 
the spleen are isolated. 

15 The spleen cells are fused, typically using polyethylene 

glycol, with mouse myeloma cells, such as SP2/0-Agm myeloma 
cells. The excess, unfused cells are destroyed by growth of the 
system on selective media comprising aminopterin (HAT media). The 
successfully fused cells are diluted, and aliquots are plated to 

20 microliter plates where growth is continued- Antibody — 

producing clones (hybridomas) are identified by detection of 
antibody in the supernatant fluid of the wells by immunoassay 
procedures- These include ELISA, as originally described by 
Engvall, Meth. Enzymol. 70: m*! (lTSO)-, western blot analysis, 

25 radioimmunoassay (Lutz et al., Exp. Cell Res. 175:10=1-1214 (ITflfl)) 
and modified methods thereof. 

Selected positive clones can be expanded and their monoclonal 
antibody product harvested for use- Detailed procedures for 
monoclonal antibody production are described in Davis, L- et al. 

30 BASIC METHODS IN MOLECULAR BIOLOGY, Elsevier, New York- Section 
21-2 (ITflT). The hybridoma clones may be cultivated in vitro or 
in vivoi for instance as ascites- Production of high titers of 
mAbs in vivo makes this the presently preferred method of 
production. Alternatively, hybridoma culture in hollow fiber 

35 bioreactors provides a continuous high yield source of monoclonal 
antibodies- , 

The antibody class and subclass may be determined using 
procedures known in the art (Campbell, A-M-, Monoclonal Antibody 
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Technology: Laboratory Techniques in Biochemistry and Molecular 
Biology** Elsevier Science Publishers-, Amsterdam-i The Netherlands 
(nfl 1 *)). IIAbs may be of any immunoglobulin class including IgG-i 
IgM-i IgEi IgA-i IgD and any subclass thereof- Methods of purifying 
5 monoclonal antibodies are well known in the art- 
Antibody Derivatives and Fragments 

Fragments or derivatives of antibodies include any portion of 
the antibody which is capable of binding the target antigen-* or a 
specific portion thereof. Antibody derivatives include poly- 
10 specific (e.gr., bi-specific) antibodies-! which contain binding 
sites specific for two or more different epitopes- These epitopes 
may be from the same or different inventive molecules or one or 
more epitope may be from a molecule not specifically disclosed 
here - 

15 Antibody fragments specifically include F(ab') 2 -. Fab-i Fab r 

and Fv fragments. These can be generated from any class of 
antibody i but typically are made from IgG or IgM. They may be 
made by conventional recombinant DNA techniques or-i using the 
classical method-* by proteolytic digestion with papain or pepsin- 

20 See CURRENT PROTOCOLS IN IMMUNOLOGY-, chapter 2-. Coligan et al., 
eds.-, (John Wiley & Sons 1111-^2) . 

F(ab r >2 fragments are typically about HQ kDa (IgG) or about 
ISO kDa (IgM) and contain two antigen-binding regions-i joined at 
the hinge by disulfide bond(s). Virtually alii if not all-i of the 

25 Fc is absent in these fragments. Fab 1 fragments are typically 
about SS kDa (IgG) or about 75 kDa (IgM) and can be formedi for 
examplei by reducing the disulfide bond(s) of an F(ab') 2 fragment- 
The resulting free sulfhydryl group(s) may be used to conveniently 
conjugate Fab' fragments to other moleculesi such as detection 

30 reagents (e.gr.-i enzymes). 

Fab fragments are monovalent and usually are about 5D kDa 
(from any source). Fab fragments include the light (L) and heavy 
(H) chain-i variable (V L and V H i respectively) and constant (C L Ch-i 
respectively) regions of the antigen-binding portion of the 

35 antibody. The H and L portions are linked by an intramolecular 
disulfide bridge. 

Fv fragments are typically about 25 kDa (regardless of 
source) and contain the variable regions of both the light and 

-568- 



WO 01/98454 PCT/1B01/02050 
heavy chains (Vl and Vh-i respectively)- Usually-, the Vl and V H 
chains are held together only by non-covalent interacts and-, thus-, 
they readily dissociate- They do-i however-, have the advantage of 
small size and they retain the same binding properties of the 
5 larger Fab fragments- Accordingly-* methods have been developed to 
crosslink the Vl and Vh chainsi using-i for examplen glutaraldehyde 
(or other chemical crosslinkers) i intermolecular disulfide bonds 
(by incorporation of cysteines) and peptide linkers. The 
resulting Fv is now a single chain (i.e.n SCFv). 

10 Other antibody derivatives include single chain antibodies 

(U-S. Patent Mi^Mb^Tfln Bird-, Science 2M2:423-42b (nafl)i Huston 
et al.i Proc Natl- Acad. Sci. USA 65:567^-5663 (1^66); and Ward 
et al.i Nature 33M:5*m-5m> (116^))- Single chain antibodies are 
formed by linking the heavy and light chain fragments of the Fv 

15 region via an amino acid bridgei resulting in a single chain FV 
(SCFv) - 

One preferred method involves the generation of scFvs by 
recombinant methodsi which allows the generation of Fvs with new 
specificities by mixing and matching variable chains from 

20 different antibody sources- In a typical methodi a recombinant 
vector would be provided which comprises the appropriate 
regulatory elements driving expression of a cassette region- The 
cassette region would contain a DNA encoding a peptide linker-, 
with convenient sites at both the 5 F and 3 r ends of the linker for 

25 generating fusion proteins- The DNA encoding a variable region(s) 
of interest may be cloned in the vector to form fusion proteins 
with the linker-i thus generating an scFv- 

In an exemplary alternative approachi DNAs encoding two Fvs 
may be ligated to the DNA encoding the linkeri and the resulting 

30 tripartite fusion may be ligated directly into a conventional 
expression vector. The scFv DNAs generated any of these methods 
may be expressed in prokaryotic or eukaryotic cells-i depending on 
the vector chosen. 

Antibody fragments which recognize specific epitopes may be 

35 generated by known techniques. For examplei such fragments 
include but are not limited to: the F(ab r ) 2 fragments which can be 
produced by pepsin digestion of the antibody molecule and the Fab 
fragments which can be generated by reducing the disulfide bridges 
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of the F(ab>2 fragments- Alternatively! Fab expression libraries 
may be constructed (Huse et al-i naTi Science** EMb : 1275-1231 ) to 
allow rapid and easy identification of monoclonal Fab fragments 
with the desired specificity- 
5 Derivatives also include "chimeric antibodies" (Morrison et 

al.i Proc. Natl. Acad. Sci . n fil : baSl-bflSS (1^)=; Neuberger et 
al.-i Nature -i 31S:bDH-bOA (Mfii|>; Takeda et al.i Nature! 3m:M52- 
4SM (naS))- These chimeras are made by splicing the DNA encoding 
a mouse antibody molecule of appropriate specificity withi for 

10 instance! DNA encoding a human antibody molecule of appropriate 
specificity. Thusi a chimeric antibody is a molecule in which 
different portions are derived from different animal speciesi such 
as those having a variable region derived from a murine mAb and a 
human immunoglobulin constant region. These are also known 

15 sometimes as "humanized" antibodies and they offer the added 
advantage of at least partial shielding from the human immune 
system. They aren therefore! particularly useful in therapeutic 
in vivo applications- 

Labeled Antibodies 

20 The present invention further provides the above-described 

antibodies in detectably labeled form- Antibodies can be 
detectably labelled through the use of radioisotopes i affinity 
labels (such as biotin-. avidinn etc-)i enzymatic labels (such as 
horseradish peroxidase-, alkaline phosphatase! etc) fluorescent 

25 labels (such as FITC or rhodaminei etc)i paramagnetic atoms-i etc- 
Procedures for accomplishing such labeling are well-known in the 
artn for example see (Sternberger et al.i J". Histochem. Cytochem* 
18:315 (n?0)! Bayer et al--. Meth. Enzym- t.2:308 (nTDi Engval et 
al.-i Immunol. IDTslST (n7S)i Goding-, J". Immunol. Meth. 13:E15 

30 (n7b)). The labeled antibodies of the present invention can be 
used for in vitro, in vivo-* and in situ diagnostic assays- 

Immobilized Antibodies 

The foregoing antibodies also may be immobilized on a solid 
support- Examples of such solid supports include plastics such as 
35 polycarbonate! complex carbohydrates such as agarose and 
sepharose!* acrylic resins and such as polyacrylamide and latex 
beads- Techniques for coupling antibodies to such solid supports 
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are well known in the art (Weir et al-! "Handbook of Experimental 
Immunology" *4th Ed-! Blackwell Scientific Publications-i Oxfords 
England! Chapter ID (nab)} Jacoby et al., Meth. Enzym- 3M 
Academic Pressi N-Y. (l^M) ) - The immobilized antibodies of the 
5 present invention can be used for in vitro, in vivo-i and in situ 
assays as well as for immunoaf f inity purification of the proteins 
of the present invention. 

THERAPEUTIC AND DIAGNOSTIC COMPOSITIONS 

The proteins! antibodies and polynucleotides of the present 
10 invention can be formulated according to known methods to prepare 
pharmaceutical^ useful compositions! whereby these materials-i or 
their functional derivatives! are combined in admixture with a 
pharmaceutical^ acceptable carrier vehicle- Suitable vehicles 
and their formulation-! inclusive of other human proteins! e.g.! 
15 human serum albumini are described! for example! in Remington's 
Pharmaceutical Sciences (Ibth ed-! Osol! A-t Ed-! Mack! Easton PA 
(ITflD)). In order to form a pharmaceutical^ acceptable 
composition suitable for effective administration! such 
compositions will contain an effective amount of one or more of 
20 the agents of the present invention! together with a suitable 
amount of carrier vehicle- 

Pharmaceutical compositions for use in accordance with the 
present invention may be formulated in conventional manner using- 
one or more physiologically acceptable carriers or excipients. 

25 Thus! the compounds and their physiologically acceptable salts and 
solvate may be formulated for administration by inhalation or 
insufflation (either through the mouth or the nose) or oral! 
buccaln parenteral or rectal administration. 

For oral administration! the pharmaceutical compositions may 

30 take the form of-, for example! tablets or capsules prepared by 
conventional means with pharmaceutical^ acceptable excipients 
such as binding agents (e.g. ! pregelatinised maize starch! 
polyvinylpyrrolidone or hydroxypropyl methylcellulose) ! fillers 
(e.g.! lactose! microcrystalline cellulose or calcium hydrogen 

35 phosphate) t lubricants (e.g. n magnesium stearatei talc or silica)! 
disintegrants (e.g.! potato starch or sodium starch glycolate)! or 
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wetting agents (e.g.! sodium lauryl sulphate). The tablets may be 
coated by methods well known in the art- Liquid preparations for 
oral administration may take the form of-, for example! solutions-! 
syrups or suspensions^ or they maybe presented as a dry product 
for constitution with water or other suitable vehicle before use- 
Such liquid preparations may be prepared by conventional means 
with pharmaceutical^ acceptable additives such as suspending 
agents (e.g.! sorbitol syrup! cellulose derivatives or 
hydrogenated edible fats); emulsifying agents (e.g. i lecithin or 
acacia); non-aqueous vehicles (e.g. ! almond oili oily esters-i 
ethyl alcohol or fractionated vegetable oils); and preservatives 
(e.g.! methyl or propy 1-p-hydroxybenzoates or sorbic acid).. The 
preparations may also contain buffer salts-i flavoring! coloring 
and sweetening agents as appropriate- 
Preparations for oral administration may be suitably 
formulated to give controlled release of the active compound. For 
buccal administration the composition may take the form of tablets 
or lozenges formulated in conventional manner- 

For administration by inhalation! the compounds for use 
according to the present invention are conveniently delivered in 
the form of an aerosol spray presentation from pressurized packs 
or a nebulises with the use of a suitable propellanti e.g.! 
dichlorodif luorome thane i tr ichlor of luorome thane i 

dichlorotetraf luoroethane-i carbon dioxide or other suitable gas- 
In the case of a pressurized aerosol the dosage unit may be 
determined by providing a valve to deliver a metered amount. 
Capsules and cartridges of! e.g. gelatin for use in an inhaler or 
insufflator may be formulated containing a powder mix of the 
compound and a suitable powder base such as lactose or starch. 

The compounds may be formulated for parenteral administration 
by injection! e.g.-* by bolus injection or continuous infusion- 
Formulations for injection may be presented in unit dosage form! 
e.g.! in ampules or in multi-dose containers! with an added 
preservative. The compositions may take such forms as 

suspensions! solutions or emulsions in oily or aqueous vehicles! 
and may contain formulatory agents such as suspending! stabilizing 
and/or dispersing agents- Alternatively! the active ingredient 
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may be in powder form for constitution with a suitable vehicle! 
e.gr.i sterile pyrogen-free water-, before use- 

The compounds may also be formulated in rectal compositions 
such as suppositories or retention enemasi e.g.! containing 
5 conventional suppository bases such as cocoa butter or other 
glycerides- 

In addition to the formulations described previously! the 
compounds may also be formulated as a depot preparation- Such 
long acting formulations may be administered by implantation (for 

10 example subcutaneously or intramuscularly) or by intramuscular 
injection- Thus! for example! the compounds may be formulated 
with suitable polymeric or hydrophobic materials (for example as 
an emulsion in an acceptable oil) or ion exchange resinsi or as 
sparingly soluble derivatives! for example! as a sparingly soluble 

15 salt- 

The compositions may! if desired! be presented in a pack or 
dispenser device which may contain one or more unit dosage forms 
containing the active ingredient. The pack may for example 
comprise metal or plastic foil! such as a blister pack- The pack 
20 or dispenser device may be accompanied by instructions for 
administration. 

RECOMBINANT CONSTRUCTS AND EXPRESSION 

The present invention further provides recombinant DNA 
constructs comprising one or more of the nucleotide sequences of 

25 the present invention. The recombinant constructs of the present 
invention comprise a vector! such as a plasmid or viral vector! 
into idhich a DNA or DNA fragment-, typically bearing an open 
reading frame! is inserted! in either orientation. The gene 

products encoded by the subject DNAs may be produced by 

30 recombinant DNA technology using techniques well known in the art- 
See-i for example! the techniques described in Sambrook et al.i 
lTflTn supra-* and Ausubel et al-i ITflTi supra- Alternatively! the 
DNA sequences may be chemically synthesized using! for example! 
synthesizers. Seei for example! the techniques described in 

35 OLIGONUCLEOTIDE SYNTHESIS! lia^ Gait! ed.! IRL Press! Oxford! 
which is incorporated by reference herein in its entirety. They 
may be assembled from fragments and short oligonucleotide linkers! 
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or from a series of oligonucleotides- The are preferably made by 
RT-PCR methods. The resulting synthetic gene is capable of being 
expressed in a recombinant vector- 

In some cases the recombinant constructs will be expression 
5 vectors-i which are capable of expressing the RNA and/or protein 
products of the encoded DNA (s) - Thus-! the vector may further 
comprise regulatory sequences! including for example-i a promoters 
operably linked to the open reading frame (ORF). The vector may 
further comprise a selectable marker sequence. 

10 Specific initiation signals may also be required for 

efficient translation of inserted target gene coding sequences- 
These signals include the ATG initiation codon and adjacent 
sequences. In cases where a target DNA includes its own 
initiation codon and adjacent sequences is inserted into the 

15 appropriate expression vector-i no additional translation control 
signals may be needed- However in cases where only a portion of 
an ORF is used-i exogenous translational control, signals-! 
including-, perhaps-i the ATG initiation codon-i must be provided. 
Furthermore-* the initiation codon must be in phase with the 

20 reading frame of the desired coding sequence to ensure translation 
of the entire target. These exogenous translational control 
signals and initiation codons can be of a variety of origins-i both 
natural and synthetic The efficiency of expression may be 
enhanced by the inclusion of appropriate transcription enhancer 

25 elementsn transcription terminators-i etc. (see Bittner et al.i 
Methods in Enzymol. 153: Slb-SHM (na?)). Some appropriate cloning 
and expression vectors for use with prokaryotic and eukaryotic 
hosts are described by Sambrooki et al., in Molecular Cloning: A 
Laboratory Manual-* Second Edition-! Cold Spring Harbor-! New York 

30 (nfiTJ-i the disclosure of which is hereby incorporated by 
reference- 

If desired-! to enhance expression and facilitate proper 
protein folding! the codon context and codon pairing of the 
sequence may be optimized for the particular expression organism-! 
35 as explained by Hatfield et al., U.S. Patent No- S^OaE-.?^?. 

The present invention further provides host cells containing 
at least one of the DNAs of the present invention- The host cell 
can be virtually any cell for which expression vectors are 
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available- It may be-i for example! a higher eukaryotic host cell! 
such as a mammalian cell! a lower eukaryotic host cell! such as a 
yeast cell! or the host cell can be a prokaryotic cell! such as a 
bacterial cell- Introduction of the recombinant construct into 
5 the host cell can be effected by calcium phosphate transf ection-i 
DEAEi dextran mediated transf ection-i or electroporation (Davis et 
al., Basic Methods in Molecular Biology (ITSt))- 

A wide variety of expression systems are available! such as: 
yeast (e.g. Saccharomyces , Pichia) transformed with recombinant 

10 yeast expression vectors containing the target DNAi insect cell 
systems infected with recombinant virus expression vectors (e.g.! 
baculovirus) containing the target DNA sequences^ plant cell 
systems infected with recombinant virus expression vectors (e.g.! 
cauliflower mosaic virus-. CaMVi tobacco mosaic virus-i TI1V) or 

15 transformed with recombinant plasmid expression vectors (e.gr. Ti 
plasmid) containing target DNA coding sequences! or mammalian cell 
systems (e.g. COSi CH0-. BHK-. E^! 3T3) harboring recombinant 
expression constructs containing promoters derived from the genome 
of mammalian cells (e.g.! metallothionein promoter) or from 

20 mammalian viruses (e.g.! the adenovirus late promoter! the 
vaccinia virus 7-SK promoter). 

Depending on the system choseni the resulting product may 
differ- For example! proteins expressed in most bacterial 
cultures! e.g.! £. coli f will be free of glycosylation 

25 modifications! polypeptides or proteins expressed in yeast will 
have a glycosylation pattern different from that expressed in 
mammalian cells- 
Vectors 

Generally! recombinant expression vectors will include 
30 origins of replication and selectable markers permitting selection 
of the host cell! e.g.! the ampicillin resistance gene of B. coli 
and S. cerevisiae TRP1 gene! and a promoter derived from a highly- 
expressed gene to direct transcription of a downstream structural 
sequence. Such promoters can be derived from operons encoding 
35 glycolytic enzymes such as 3-phosphoglycerate kinase (PGK) ! 
a-factor! acid phosphatase! or heat shock proteins! among others- 
The heterologous structural sequence is assembled in appropriate 
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phase with translation initiation and termination sequence-i and in 
one aspect of the invention! a leader sequence capable of 
directing secretion of translated protein into the periplasmic 
space or extracellular medium. Optionally! the heterologous 
5 sequence can encode a fusion protein including an N-terminal or C- 
terminal identification peptide imparting desired characteristics! 
e.g.! stabilization or simplified purification of expressed 
recombinant product- 
Bacterial Expression 

10 Useful expression vectors for bacterial use are constructed 

by inserting a structural DNA sequence encoding a desired protein 
together with suitable translation initiation and termination 
signals in operable reading phase with a functional promoter- The 
vector will comprise one or more phenotypic selectable markers and 

15 an origin of replication to ensure maintenance of the vector and! 
if desirable! to provide amplification within the host- Suitable 
prokaryotic hosts for transformation include E. coli, Bacillus 
subtilis, Salmonella typhimurium and various species within the 
genera Pseudomonas! Streptomyces! and Staphylococcus! although 

20 others may! also be employed as a matter of choice- 

Bacterial vectors may be! for example! bacteriophage-! 
plasmid- or cosmid-based- These vectors can comprise a selectable 
marker and bacterial origin of replication derived from 
commercially available plasmids typically containing elements of 

25 the well known cloning vector pBR3EE (ATCC 3701?)- Such 
commercial vectors include! for example! GEM 1 (Promega Biotec! 
Madison! Ul! USA)! pBs! phagescript! PsiX17i*! pBluescript SK! pBs 
KS! pNHfla! pNHlba! pNHlfla! pNHMba (Stratagene) \ pTrc^A! pKKEE3-3! 
pKKE33-3! pKKS3E-fl! pDRSMDi and pRITS (Pharmacia). 

30 These "backbone" sections are combined with an appropriate 

promoter and the structural sequence to be expressed. Bacterial 
promoters include lac! T3t T?! lambda Pr or Pl-i trp! and ara- 

Following transformation of a suitable host strain and growth 
of the host strain to an appropriate cell density! the selected 

35 promoter is derepressed/induced by appropriate means (e.g.! 
temperature shift or chemical induction) and cells are cultured 
for an additional period. Cells are typically harvested by 
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centrif ugatiorii disrupted by physical or chemical means-i and the 
resulting crude extract retained for further purification. 

In bacterial systems! a number of expression vectors may be 
advantageously selected depending upon the use intended for the 
5 protein being expressed. For example! when a large quantity of 
such a protein is to be produced! for the generation of antibodies 
or to screen peptide libraries-* for example-i vectors which direct 
the expression of high levels of fusion protein products that are 
readily purified may be desirable- Such vectors include! but are 

10 not limited-* to the E. coli expression vector pURH7fl (Ruther et 
al.n nfl3i EMBO J. S:17Tl)n in which the coding sequence may be 
ligated into the vector in frame with the lac Z coding region so 
that a fusion protein is produced^ pIN vectors (Inouye et al. 
nflSi Nucleic Acids Res. 13 : BIDI-SIOT! Van Heeke et al.i l^flT, J . 

15 Biol. Chem. Eb^SSDB-SSCH) i pET vectors! Studier et al.n Methods 
in Enzymology IflS: bQ-BT (Academic Press MID) n and the like. 

Moreover pGEX vectors may be used to express foreign 
polypeptides as fusion proteins with glutathione S-transf erase 
(GST). In generals such fusion proteins are soluble and easily 

20 can be purified from lysed cells by adsorption to glutathione- 
agarose beads followed by elution in the presence of free 
glutathione. The pGEX vectors are designed to include thrombin or 
factor Xa protease cleavage sites so that the cloned target gene 
protein can be released from the GST moiety. 

25 In a one embodiment! full length cDNA sequences are appended 

with in-frame BamHI sites at the amino terminus and EcoRl sites at 
the carboxyl terminus using standard PCR methodologies (Innis et 
al-i inDi supra) and ligated into the pGEX-BTK vector (Pharmacia! 
Uppsala-i Sweden). The resulting cDNA construct contains a kinase 

30 recognition site at the amino terminus for radioactive labeling 
and glutathione S-transf erase sequences at the carboxyl terminus 
for affinity purification (Nilsson! et al. nfiS-, EMBO J. M: 1Q7S\ 
Zabeau and Stanley! ITflH! EMBO J. 1:1217- 

Eukaryotic Expression 
35 Various mammalian cell culture systems can also be employed 

to express recombinant protein. Examples of mammalian expression 
systems include the COS-7 lines of monkey kidney fibroblasts! 
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described by Gluzman-. Cell 23:175 (1181) ! and other cell lines 
capable of expressing a compatible vectori for example! the C1E?! 
3T3-I CHOi HeLa and BHK cell lines. Mammalian expression vectors 
will comprise an origin of replication! a suitable promoter and 
5 enhancer! and also any necessary ribosome binding sitesi 
polyadenylation site! splice donor and acceptor sites! 
transcriptional termination sequences! and 5' flanking 
nontranscribed sequences- DNA sequences derived from the SVm] 
viral genome-i for example! SVM0 origin! early promoter! enhancer! 
10 splice! and polyadenylation sites may be used to provide the 
required nontranscribed genetic elements- 
Mammalian promoters include CMV immediate early! HSV 
thymidine kinase! early and late SVMO! LTRs from retrovirus! and 
mouse metallothionein-I - Exemplary mammalian vectors include 
15 pULneo! pSVHcat! pOG*m! pXTl! pSG (Stratagene) pSVK3! pBPV! pMSG! 
and pSVL (Pharmacia)- Selectable markers include CAT 

(chloramphenicol transferase) - 

In mammalian host cellsi a number of viral-based expression 
systems may be utilized. In cases where an adenovirus is used as 
20 an expression vector! the coding sequence of interest may be 
ligated to an adenovirus transcription/translation control 
complex i e.g.! the late promoter and tripartite leader sequence- 
This chimeric gene may then be inserted in the adenovirus genome 
by in vitro or in vivo recombination- Insertion in a non- 
25 essential region of the viral genome (e.g.! region El or E3) will 
result in a recombinant virus that is viable and capable of 
expressing a target protein in infected hosts. (E.g.! See Logan 
et al.i HAM! Proc. Natl. Acad. Sci. USA fll : SbSS-SbSI) - 

In one embodiment! cDNA sequences encoding the full-length 
30 open reading frames are ligated into pCMVIS replacing the B- 
galactosidase gene such that cDNA expression is driven by the CMV 
promoter (Alam! mCN Anal. Biochem. lAfi: SMS-BSm MacGregor et 
al.-i nfil! Nucl. Acids Res. 17: S3bS! Norton et al. ITflS! Mol. 
Cell. Biol. 5: HAD- 
35 In addition! a host cell strain may be chosen which modulates 

the expression of the inserted sequences! or modifies and 
processes the gene product in the specific fashion desired- Such 
modifications (e.g.! glycosylation) and processing (e.g.! 
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cleavage) of protein products may be important for the function of 
the protein. Different host cells have characteristic and 
specific mechanisms for the post-translational processing and 
modification of proteins- 
5 Appropriate cell lines or host systems can be chosen to 

ensure the correct modification and processing of the foreign 
protein expressed- To this endi eukaryotic host cells which 
possess the cellular machinery for proper processing of the 
primary transcript-i glycosylation-i and phosphorylation of the gene 
10 product may be used. Such mammalian host cells include but are 
not limited to CH0-, VER0-. BHK-» HeLa-. COS-. HDCK-. 213-. 3T3, UI3S-. 
etc- 

For long-term-i high-yield production of recombinant proteins 
in eukaryotic cellsi stable expression is preferred. Rather than 

15 using expression vectors which contain viral origins of 
replication-! host- cells can be transformed with DNA controlled by 
appropriate expression control elements (e.g. n promoter! enhances 
sequences! transcription terminatorsi polyadenylation sitesi 
etc.)i and a selectable marker- 

20 Following the introduction of the foreign DNAi engineered 

cells may be allowed to grow for 1-E days in an enriched media-i 
and then are switched to a selective media- The selectable marker 
in the recombinant plasmid confers resistance to the selection and 
allows cells to stably integrate the plasmid into their 

25 chromosomes and grow to form foci which in turn can be cloned and 
expanded into cell lines- This method may advantageously be used 
to engineer cell lines which express the target protein- Such 
engineered cell lines may be particularly useful in screening and 
evaluation of compounds that affect the endogenous activity of the 

30 protein- 

A number of selection systems may be usedi including but not 
limited to the herpes simplex virus thymidine kinase (Uigler-i et 
al.i Cell ll:2E3 (l 1 !??))-. hypoxanthine-guanine 

phosphoribosyltransf erase (Szybalska et al.i Proc. Natl. Acad. 
35 Sci. USA Mfi:ED2b and adenine phosphoribosyltransf erase 

(Lowy-i et al.i Cell 22:fll7 (nflD)) genes can be employed in tk"i 
hgprf or aprt" cells-i respectively- Also-i antimetabolite 
resistance can be used as the basis of selection for dhfr-i which 
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confers resistancQ to methotrexate (Wigler-i et al.-i Proc. Natl. 
Acad, Sci. USA 77:35b7 (1T8D)); O'Hare-, et al.i nflln Proc. Natl. 
Acad. Sci. USA 76:1527) ^ gptn which confers resistance to 
mycophenolic acid (Mulligan et al.-i Proc. Natl. Acad. Sci. USA 
5 7fi:E07E (nfil))i neoi which confers resistance to the 
aminoglycoside G-Hlfi (Colberre-Garapin-, et al.i nfll-i J". Mol. 
Biol. 150:1); and hydros which confers resistance to hygromycin 
(Santerre-i et al.i nfiMi Gene 3D:m7) genes- 

An alternative fusion protein system allows for the ready 

10 purification of non-denatured fusion proteins expressed in human 
cell lines (Janknecht-i et al.i Proc. Natl. Acad. Sci. USA fifi: 
a^S-flTPb (IVTD). In this systenn the gene of interest is 
subcloned into a vaccinia-based plasmid such that the gene's open 
reading frame is translationally fused to an amino-terminal tag 

15 consisting of six histidine residues. Extracts from cells 
infected with recombinant vaccinia virus are loaded onto Ni 2 * 
nitriloacetic acid-agarose columns and histidine-tagged proteins 
are selectively eluted with imidazole-containing buffers. 

In an insect system i Autographa calif ornica nuclear 

20 polyhedrosis virus (AcNPV) is used as a vector to express foreign 
genes. The virus grows in Spodoptera frugiperda cells- The 
target coding sequence may be cloned individually into non- 
essential regions (for example the polyhedrin gene) of the virus 
and placed under control of an AcNPV promoter (for example the 

25 polyhedrin promoter). Successful insertion of a target gene 
coding sequence will result in inactivation of the polyhedrin gene 
and production of non-occluded recombinant virus (i-e.i virus 
lacking the proteinaceous coat coded for by the polyhedrin gene). 
These recombinant viruses are then used to infect Spodoptera 

30 frugiperda cells in which the inserted gene is expressed. (E.g.-t 
see Smith et al-i MMi J*. Virol. Mb: Sfim Smith-, U-S. Patent No- 
1iE15i051>. 

Uhile the present proteins can be expressed in recombinant 
systemsi as described above-i cell-free translation systems can 
35 also be employed to produce such proteins using RNAs derived from 
the DNA constructs of the present invention. 
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Purification of Recombinant Proteins 

Recombinant proteins produced may be isolated by host cell 
lysis- This may be followed by one or more salting-out-, aqueous 
ion exchange or size exclusion chromatography steps- Finally-. 
5 high performance liquid chromatography (HPLC) can be employed for 
final purification steps- Microbial cells employed in expression 
of proteins can be disrupted by any convenient method-! including 
freeze-thaw cycling-, sonication-. mechanical disruption-! or use of 
cell lysing agents-i like lysozyme and chelators- 

10 If inclusion bodies are formed in bacterial systems-, they may 

be extracted from cell pellets using-, for example-, detergents-i 
reducing agents-, saltsi ureai guanidinium chloride and extremes of 
pH (e.g. <H or >1D). If denaturation occursi protein refolding 
steps (e.g.i dialysis) can be used-i as necessary-, in completing 

15 configuration of the mature protein- If disulfide bridges are 
present in the native protein-! they may be reoxidized using known 
methods. 

By way of specific non-limiting examples the recombinant 
bacterial cells-i for example E. coli, are grown in any of a number 

20 of suitable mediai for example LB-. and the expression of the 
recombinant protein induced by adding IPTG (e.g.-. lac operator- 
promoter) to the media or switching incubation to a higher 
temperature (e.g.-. X cl as? ) - After culturing the bacteria for a 
further period of between 2 and 2M hours-, the cells are collected 

25 by centrif ugation and washed to remove residual media- The 
bacterial cells are then lysed-. for example^ by disruption in a 
cell homogenizer and centrifuged to separate the cell membranes 
from the soluble cell components. If the protein aggregates into 
inclusion bodies-, this centrif ugation can be performed under 

30 conditions whereby the dense inclusion bodies are selectively 
enriched by incorporation of sugars such as sucrose into the 
buffer and centrif ugation at a selective speed- The inclusion 
bodies can then be washed in any of several solutions to remove 
some of the contaminating host proteins-i then solubilized in 

35 solutions containing high concentrations of urea (e.g. fill) of 
chaotropic agents such as guanidinium hydrochloride in the. 
presence of reducing agents such as G-mercaptoethanol or DTT 
(dithiothreitol) - 
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At this stage it may be advantageous to incubate the protein 
for several hours under conditions suitable for the protein to 
undergo a refolding process into a conformation which more closely 
resembles that of the native protein- Such conditions generally 
5 include low protein concentrations less than SOD |xg/ml)-i low 
levels of reducing agent! concentrations of urea less than 2 PI and 
often the presence of reagents such as a mixture of reduced and 
oxidized glutathione which facilitate the interchange of 
disulphide bonds within the protein molecule- The refolding 

10 process can be monitored! for example! by SDS-PAGE or with 
antibodies which are specific for the native molecule- Following 
refolding! the protein can then be purified further and separated 
from the refolding mixture by chromatography on any of several 
supports including ion exchange resinsn gel permeation resins or 

15 on a variety of affinity columns. 

Labeling Proteins 

Uhen used as a component in assay systems such as those 
described! below-i the target protein may be labeledi either 
directly or indirectly^ to facilitate detection of the present 
res-like molecules either in vitro or in vivo- Any of a variety 
of suitable labeling systems may be used including but not limited 
to radioisotopes such as 12S I; enzyme labeling systems that 
generate a detectable colorimetric signal or light when exposed to 
substrate^ and fluorescent labels. 

Where recombinant DNA technology is used for protein 
production the-i it may be advantageous to engineer fusion proteins 
that can facilitate labeling! immobilization and/or detection- 
These fusion proteins mayi for example! add amino acids which 
facilitate further chemical modification. They also may add a 
functional moiety! such as an enzyme! which directly facilitates 
detection. 
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TRANSGENIC ANIMALS 

The invention further contemplates animal models for studying 
the function of the present molecules and for overproducing the 
protein products- The disclosed DNA sequences may be used in 
5 conjunction with techniques for producing transgenic animals that 
are well known to those of skill in the art. 

To prepare transgenic animalsi target gene sequences may for 
example be introduced intoi and overexpressed ini the genome of 
the animal of interest! or-* if endogenous target gene sequences 

10 are present they may either be overexpressed or-i alternatively i 
be disrupted in order to underexpress or inactivate target gene 
expression! such as described for the disruption of apoE in mice 
(Plum et al.i Cell 71: 3M3-3S3 (1^2)). 

In order to overexpress a target gene sequence-* the coding 

15 portion of the target gene sequence may be ligated to a regulatory 
sequence which is capable of driving gene expression in the animal 
and cell type of interest- Such regulatory regions will be well 
known to those of skill in the arti and may be utilized in the 
absence of undue experimentation. 

20 For underexpression of an endogenous target gene sequence! 

such a sequence may be isolated and engineered such that when 
reintroduced into the genome of the animal of interest! the 
endogenous target gene alleles will be inactivated. Preferably! 
the engineered target gene sequence is introduced via gene 

25 targeting such that the endogenous target sequence is disrupted 
upon integration of the engineered target gene sequence into the 
animal's genome- Animals of any species! including! but not 
limited to! mice! rats! rabbits! guinea pigsi pigs! micro-pigs! 
goats! and non-human primates! e.g.! baboons! monkeys! and 

30 chimpanzees may be used to generate cardiovascular disease animal 
models- Goats! cows and sheep are particularly preferred for 
producing protein in vivo* 

Any technique known in the art may be used to introduce a 
target gene transgene into animals to produce the founder lines of 

35 transgenic animals- Such techniques include! but are not limited 
to pronuclear microinjection (Hoppe et al.i U.S. Pat. No- 
M!fi73!ni (nflT))! retrovirus mediated gene transfer into germ 
lines (Van der Putten et al.! Proc. Natl. Acad. Sci., USA fiBsbmfl- 
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bl52 (nSSDi gene targeting in embryonic stem cells (Thompson et 
al.-i Cell Sb:313-321 (HAT))! electroporation of embryos (Loi Mol . 
Cell. Biol. 3:lflD3-lflm (Mfl3))i and sperm-mediated gene transfer 
(Lavitrano et al.t Cell 57:717-723 (nAT))} etc. For a review of 
5 such techniques! see Gordon! Transgenic Animalst Intl. Rev. Cytol. 
115:171-22*1 (HfiT). 

The present invention provides for transgenic animals that 
carry the transgene in all their cells-i as well as animals which 
carry the transgene in some! but not all their cellsn i.e.-* mosaic 

10 animals. The transgene may be integrated as a single transgene or 
in concatamers-i e.g. •» head-to-head tandems or head-to-tail 
tandems- The transgene may also be selectively introduced into 
and activated in a particular cell type by following! for example! 
the teaching of Lasko et al. (Lasko et al.-* Proc. Natl. Acad. Sci. 

15 USA flT:3232-b23L (1*112) >. The regulatory sequences required for 
such a cell-type specific activation will depend upon the 
particular cell type of interest! and will be apparent to those of 
skill in the art- When it is desired that the target gene be 
integrated into the chromosomal site of the endogenous target 

20 genei gene targeting is preferred- Briefly-* when such a technique 
is to be utilizedn vectors containing some nucleotide sequences 
homologous to the endogenous target gene of interest are designed 
for the purpose of integrating! via homologous recombination with 
chromosomal sequences! into and disrupting the function of the 

25 nucleotide sequence of the endogenous target gene- 

The transgene may also be selectively introduced into a 
particular cell type-, thus inactivating the endogenous gene of 
interest in only that cell type! by following! for example! the 
teaching of G.u et al. Science 2bS: 103-ldb (HID). The 

30 regulatory sequences required for such a cell-type specific 
inactivation will depend upon the particular cell type of 
interest! and will be apparent to those of skill in the art- 

Once transgenic animals have been generated! the expression 
of the recombinant target gene and protein may be assayed 

35 utilizing standard techniques- Initial screening may be 
accomplished by Southern blot analysis or PCR techniques to 
analyze animal tissues to assay whether integration of the 
transgene has taken place- The level of mRNA expression of the 
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transgene in the tissues of the transgenic animals may also be 
assessed using techniques which include but are not limited to 
Northern blot analysis of tissue samples obtained from the animaln 
in situ hybridization analysisn and RT-PCR. Samples of target 
5 gene-expressing tissue-i may also be evaluated immunocytochemically 
using antibodies specific for the target gene transgene gene 
product of interest. 

The transgenic animals that express target gene mRNA or 
target gene transgene peptide (detected immunocytochemically t 

10 using antibodies directed against the target gene product's 
epitopes) at easily detectable levels should then be further 
evaluated to identify those animals which display characteristic 
increased susceptibility to carcinogenesis- Additionally i 

specific cell types within the transgenic animals may be analyzed 

15 and assayed in vitro for cellular phenotypes characteristic of 
mutant phenotype- 

Once target gene transgenic founder animals are producedi 
they may be bredn inbredi outbredi or crossbred to produce 
colonies of the particular animal- Examples of such breeding 

20 strategies include but are not limited to: outbreeding of founder 
animals with more than one integration site in order to establish 
separate linesi inbreeding of separate lines in order to produce 
compound target gene transgenics that express the target gene 
transgene of interest at higher levels because of the effects of 

25 additive expression of each target gene transgene; crossing of 
heterozygous transgenic animals to produce animals homozygous for 
a given integration site in order both to augment expression and 
eliminate the possible need for screening of animals by DNA 
analysis; crossing of separate homozygous lines to produce 

30 compound heterozygous or homozygous lines; breeding animals to 
different inbred genetic backgrounds so as to examine effects of 
modifying alleles on expression of the target gene transgene and 
the possible development of carcinogenesis. One such approach is 
to cross the target gene transgenic founder animals with a wild 

35 type strain to produce an Fl generation that exhibits increased 
susceptibility to carcinogenesis. The Fl generation may then be 
inbred in order to develop a homozygous line^ if it is found that 
homozygous target gene transgenic animals are viable- 
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Methods of generating "knockout" mice using homologous 
recombination in embryonic stem cells are well known in the art- 
Suitable methods are described! for example n in Mansour et al.-. 
Nature-. 33b:3Mfi (nflfl)i Zijlstra et al.-. Nature^ 3ME^3S (nflT) 
5 and 3^^:?^2 (ITTD ) i and Hasty et al.-. Nature-, SSDrE^ (mi). 
This genomic DNA can be obtained by conventional methods using the 
cDNA sequence as a probe in a commercially-available genomic DNA 
library- 

Briefly-, a genomic fragment is cleaved with a restriction 

10 endonuclease and a heterologous cassette containing a neomycin- 
resistance gene is inserted at the cleavage site- A suitable 
cassette is the GTI-II neo cassette described by Lufkin et al.-. 
Cell bb:HDS (1=1*11) ■ The modified genomic fragment is cloned into 
a suitable targeting vector that is introduced into murine 

15 embryonic stem cells by electroporation- Cells that have 
undergone homologous recombination (and hence disruption of the 
gene) are selected by resistance to GMlfl-. and used to generate 
chimeric mice using well known methods. See Lufkin et al.-i supra* 
Traditional breeding methods then can be used to generate mice 

20 that are homozygous for the disrupted gene- 

The phenotype of mice that are homozygous for the mutation 
then can be studied to provide insights into the role of the 
protein in-, for example-, carcinogenesis. These mice also can be 
used as models for developing new treatments for cancers. If this 

25 mutation is lethal in homozygous mice (for example during 
embryogenesis) heterozygous mice-i which express only half the 
amount of the protein can also be studied- 

GENE THERAPY APPLICATIONS 

When mutations in the inventive protein-, or in the elements 
30 controlling expression of that protein-, are found to be associated 
with a malignant phenotype-. control of cellular proliferation can 
be restored by gene therapy methods- For example-, overexpression 
of the protein can be counteracted by concurrent expression of an 
antisense molecule that binds to and inhibits expression of the 
35 mRNA encoding the protein- Alternatively-, overexpression can be 
inhibited in an analogous manner using a ribozyme that cleaves the 
mRNA- In another embodiment-, where expression of a mutated 
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protein induces the malignant phenotypei concomitant expression of 
the non-mutated molecule via introduction of an exogenous gene may 
be used. Methods of using antisense and ribozyme technology to 
control gene expression-i or of gene therapy methods for expression 
5 of an exogenous gene in this manner are well known in the art- 
Each of these methods requires a system for introducing a 
vector into the cells containing the mutated gene- The vector 
encodes either an antisense or ribozyme transcript of the 
inventive protein- The construction of a suitable vector can be 

10 achieved by any of the methods well-known in the art for the 
insertion of exogenous DNA into a vector- See, e.g.-* Sambrook et 
al.i Molecular Cloning (Cold Spring Harbor Press 2d ed. na^i 
which is incorporated herein by reference- In addition! the prior 
art teaches various methods of introducing exogenous genes into 

15 cells in vivo- See Rosenberg et al.i Science 242:1575-1576 (HBfl) 
and Uolff et al.i PNAS fib :1D11-1014 (llfll)i which are incorporated 
herein by reference- The routes of delivery include systemic 
administration and administration in situ* Well-known techniques 
include systemic administration with cationic liposomes-i and 

20 administration in situ with viral vectors- Any one of the gene 
delivery methodologies described in the prior art is suitable for 
the introduction of a recombinant vector containing an inventive 
gene according to the invention into a fITX-resistant i transport- 
deficient cancer cell- A listing of present-day vectors suitable 

25 for the purpose of this invention is set forth in Hodgson-i 
Bio/Technology 13: 222 (1115) -i which is incorporated by reference- 
For example-i liposome-mediated gene transfer is a suitable 
method for the introduction of a recombinant vector containing an 
inventive gene according to the invention into a MTX-resistant n 

30 transport-deficient cancer cell- The use of a cationic liposomen 
such as DC-Chol/DOPE liposome! has been widely documented as an 
appropriate vehicle to deliver DNA to a wide range of tissues 
through intravenous injection of DNA/cationic liposome complexes- 
See Caplen et al.i Nature Med. l:31-Mb tmS) and Zhu et al.i 

35 Science 252 5 201-211 (1113) i which are herein incorporated by 
reference. Liposomes transfer genes to the target cells by fusing 
with the plasma membrane. The entry process is relatively 
efficient-! but once inside the celli the liposome-DNA complex has 
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no inherent mechanism to deliver the DNA to the nucleus- As suchi 
the most of the lipid and DNA gets shunted to cytoplasmic waste 
systems and destroyed- The obvious advantage of liposomes as a 
gene therapy vector is that liposomes contain no proteinsi which 
5 thus minimizes the potential of host immune responses. 

As another example-* viral vector-mediated gene transfer is 
also a suitable method for the introduction of the vector into a 
target cell. Appropriate viral vectors include adenovirus vectors 
and adeno-associated virus vectorsn retrovirus vectors and 

10 herpesvirus vectors- 

Adenoviruses are linear-i double stranded DNA viruses 
complexed with core proteins and surrounded by capsid proteins- 
The common serotypes E and S-i which are not associated with any 
human malignanciesi are typically the base vectors- By deleting 

15 parts of the virus genome and inserting the desired gene under the 
control of a constitutive viral promoter-i the virus becomes a 
replication deficient vector capable of transferring the exogenous 
DNA . to dif f erentiatedn non-proliferating cells- To enter cellsn 
the adenovirus fibre interacts with specific receptors on the cell 

20 surface-i and the adenovirus surface proteins interact with the 
cell surface integrins- The virus penton-cell integrin 

interaction provides the signal that brings the exogenous gene- 
containing virus into a cytoplasmic endosome- The adenovirus 
breaks out of the endosome and moves to the nucleusi the viral 

25 capsid falls aparti and the exogenous DNA enters the cell nucleus 
where it functions! in an epichromosomal fashionn to express the 
exogenous gene- Detailed discussions of the use of adenoviral 
vectors for gene therapy can be found in Berkneri Biotechniques 
^^Ib-bET (ITflfi) and Trapnelln Advanced Drug Delivejy Rev. 12:1A5- 

30 m (mB)! which are herein incorporated by reference. 
Adenovirus-derived vectors -i particularly non-replicative 
adenovirus vectorsi are characterized by their ability to 
accommodate exogenous DNA of 7-5 kBn relative stability-* wide host 
range-i low pathogenicity in mani and high titers (ID 14 to 10 s 

35 plaque forming units per cell). See Stratf ord-Perricaudet et al . n 
PNAS 89-ES&1 (ma) - 

Adeno-associated virus (AAV) vectors also can be used for the 
present invention- AAV is a linear single-stranded DNA parvovirus 
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that is endogenous to many mammalian species- AAV has a broad 
host range despite the limitation that AAV is a defective 
parvovirus which is dependent totally on either adenovirus or 
herpesvirus for its reproduction in vivo- The use of AAV as a 
5 vector for the introduction into target cells of exogenous DNA is 
well-known in the art- See, e.gr. -i Lebkowski et al.-i Mole. & Cell. 
Biol. &i3^&& (nfifi)n which is incorporated herein by reference- 
In these vectorsi the capsid gene of AAV is replaced by a desired 
DNA fragment! and transcomplementation of the deleted capsid 

10 function is used to create a recombinant virus stock- Upon 
infection the recombinant virus uncoats in the nucleus and 
integrates into the host genome- 

Another suitable virus-based gene delivery mechanism is 
retroviral vector-mediated gene transfer- In general retroviral 

15 vectors are well-known in the art- See Breakfield et al.-i Mole. 
Neuro. Biol. 2:331 (ITS?) and Shih et al.-i in Vaccines 35: 17? 
(Cold Spring Harbor Press ITflS)- A variety of retroviral vectors 
and retroviral vector-producing cell lines can be used for the 
present invention- Appropriate retroviral vectors include Moloney 

20 Murine Leukemia Virus-i spleen necrosis virusi and vectors derived 
from retroviruses such as Rous Sarcoma Virus-* Harvey Sarcoma 
Virus-i avian leukosis virus-i human immunodeficiency virus-* 
myeloproliferative sarcoma virus-* and mammary tumor virus- These 
vectors include replication-competent and replication-defective 

25 retroviral vectors- In addition-* amphotropic and xenotropic 
retroviral vectors can be used- In carrying out the invention-* 
retroviral vectors can be introduced to a tumor directly or in the 
form of free retroviral vector producing-cell lines- Suitable 
producer cells include fibroblasts-i neuronsi glial cells-* 

30 keratinocytes-. hepatocytes-* connective tissue cells-* ependymal 
cells-* chromaffin cells- See lilolff et al.-. PNAS 54:33*14 (Hfll). 

Retroviral vectors generally are constructed such that the 
majority of its structural genes are deleted or replaced by 
exogenous DNA of interest-* and such that the likelihood is reduced 

35 that viral proteins will be expressed- See Bender et al.-. J. 
Virol. blrltiBT (llfl?) and Armento et al.-i J. Virol. 61'1\dH7 
(Hfl?)-* which are herein incorporated by reference- To facilitate 
expression of the antisense or ribozyme molecule-* of the inventive 
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protein! a retroviral vector employed in the present invention 
must integrate into the genome of the host cell genome! an event 
which occurs only in mitotically active cells. The necessity for 
host cell replication effectively limits retroviral gene 
5 expression to tumor cells-i which are highly replicativei and to a 
few normal tissues- The normal tissue cells theoretically .most 
likely to be transduced by a retroviral vector! therefore! are the 
endothelial cells that line the blood vessels that supply blood to 
the tumor. In addition-! it is also possible that a retroviral 

10 vector would integrate into white blood cells both in the tumor or 
in the blood circulating through the tumor. 

The spread of retroviral vector to normal tissues-* however! 
is limited- The local administration to a tumor of a retroviral 
vector or retroviral vector producing cells will restrict vector 

15 propagation to the local region of the tumor! minimizing 
transduction! integration! expression and subsequent cytotoxic 
effect on surrounding cells that are mitotically active- 

Both replicatively deficient and replicatively competent 
retroviral vectors can be used in the invention! subject to their 

20 respective advantages and disadvantages. For instance! for tumors 
that have spread regionally! such as lung cancers! the direct 
injection of cell lines that produce replication-deficient vectors 
may not deliver the vector to a large enough area to completely 
eradicate the tumor! since the vector will be released only form 

25 the original producer cells and their progeny! and diffusion is 
limited. Similar constraints apply to the application of 
replication deficient vectors to tumors that grow slowly! such as 
human breast cancers which typically have doubling times of 3D 
days versus the EH hours common among human gliomas. The much 

30 shortened survival-time of the producer cellsn probably no more 
than 7-m days in the absence of immunosuppression! limits to only 
a portion of their replicative cycle the exposure of the tumor 
cells to the retroviral vector- 

The use of replication-defective retroviruses for treating 

35 tumors requires producer cells and is limited because each 
replication-defective retrovirus particle can enter only a single 
cell and cannot productively infect others thereafter- Because 
these replication-defective retroviruses cannot spread to other 
tumor cells! they would be unable to completely penetrate a deep! 
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multilayered tumor in vivo* See Markert et al., Neurosurg. 77' 
STD (mH). The injection of replication-competent retroviral 
vector particles or a cell line that produces a replication- 
competent retroviral vector virus may prove to be a more effective 
5 therapeutic because a replication competent retroviral vector will 
establish a productive infection that will transduce cells as long 
as it persists- Moreover! replicatively competent retroviral 
vectors may follow the tumor as it metastasizes-i carried along and 
propagated by transduced tumor cells- The risks for complications 

10 are greateri with replicatively competent vectors-i however- Such 
vectors may pose a greater risk then replicatively deficient 
vectors of transducing normal tissuesi for instance- The risks of 
undesired vector propagation for each type of cancer and affected 
body area can be weighed against the advantages in the situation 

15 of replicatively competent verses replicatively deficient 
retroviral vector to determine an optimum treatment- 

Both amphotropic and xenotropic retroviral vectors may be 
used in the invention- Amphotropic viruses have a very broad host 
range that includes most or all mammalian cellsi as is well known 

20 to the art- Xenotropic viruses can infect all mammalian cell 
cells except mouse cells- Thus-i amphotropic and xenotropic 
retroviruses from many speciesi including cousi sheepi pigs-i dogsi 
cats-i ratsi and micei inter alia can be used to provide retroviral 
vectors in accordance with the invention-i provided the vectors can 

25 transfer genes into proliferating human cells in vivo. 

Clinical trials employing retroviral vector therapy treatment 
of cancer have been approved in the United States- See Culver-i 
Clin. Chem. 40* S1Q (1111). Retroviral vector-containing cells 
have been implanted into brain tumors growing in human patients. 

30 See Oldfield et al., Hum. Gene Ther. 4' 3*\ (1^3). These 
retroviral vectors carried the HSV-1 thymidine kinase (HSV-tk) 
gene into the surrounding brain tumor cellsi which conferred 
sensitivity of the tumor cells to the antiviral drug ganciclovir. 
Some of the limitations of current retroviral based cancer 

35 therapyi as described by Oldfield are: d) the low titer of virus 
produced-! (B) virus spread is limited to the region surrounding 
the producer cell implants (3) possible immune response to the 
producer cell line-. (4) possible insertional mutagenesis and 
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transformation of retroviral infected cells-. (5) only a single 
treatment regimen of pro-drug-i ganciclovir is possible because 
the "suicide" product kills retrovirally infected cells and 
producer cells and (b) the bystander effect is limited to cells in 
5 direct contact with retrovirally transformed cells- See Bi et 
al.i Human Gene Therapy 4 = ?2S (1^3). 

Yet another suitable virus-based gene delivery mechanism is 
herpesvirus vector-mediated gene transfer- While much less is 
known about the use of herpesvirus vectors-i replication-competent 
10 HSV-1 viral vectors have been described in the context of 
antitumor therapy. See Hartuza et al., Science 252: (mi)-, 
which is incorporated herein by reference. 

DIAGNOSTIC METHODS 

The present invention also contemplates-! for certain 

15 molecules described below-* methods for diagnosis of human disease. 
In particular-i patients can be screened for the occurrence of 
cancers-i or likelihood of occurrence of cancers-* associated with 
mutations in the encoded protein- DNA from tumor tissue obtained 
from patients suffering from cancer can be isolated , and the gene 

20 encoding the protein can be sequenced- By examining a .number of 
patients in this manneri mutations in the gene that are associated 
with a malignant cellular phenotype can be identified. In 
addition-i correlation of the nature of the observed mutations with 
subsequent observed clinical outcomes allows development of 

25 prognostic model for the predicted outcome in a particular 
patient- 
Screening for mutations conveniently can be carried out at 
the DNA level by use of PCR-i although the skilled artisan will be 
aware that many other well known methods are available for the 

30 screening. PCR primers can be selected that flank known mutation 
sitesn and the PCR products can be sequenced to detect the 
occurrence of the mutation. Alternatively! the 3 r residue of one 
PCR primer can be selected to be a match only for the residue 
found in the unmutated gene. If the gene is mutated-i there will 

35 be a mismatch at the 3 1 end of the primeri and primer extension 
cannot occur-i and no PCR product will be obtained. Alternatively-* 
primer mixtures can be used where the 3' residue of one primer is 



-592- 



WO 01/98454 PCT/IB01/02050 
any nucleotide other than the nonmutated residue- Observation of 
a PCR product then indicates that a mutation has occurred. Other 
methods of usingi for examplen oligonucleotide probes to screen 
for mutations are describedn or example-i in U-S- Patent No- 
5 ^67113361 which is herein incorporated by reference in its 
entirety. 

Alternatively t antibodies can be generated that selectively 
bind either mutated or non-mutated protein- The antibodies then 
can be used to screen tissue samples for occurrence of mutations 

10 in a manner analogous to the DNA-based methods described supra* 

The diagnostic methods described above can be used not only 
for diagnosis and for prognosis of existing disease-* but may also 
be used to predict the likelihood of the future occurrence of 
disease. For examplei clinically healthy patients can be screened 

15 for mutations in the inventive molecule that correlate with later 
disease onset. Such mutations may be observed in the heterozygous 
state in healthy individuals. In such cases a single mutation 
event can effectively disable proper functioning of the gene and 
induce a transformed or malignant phenotype- This screening also 

20 may be carried out prenatally or neonatally. 

DNA molecules according to the invention also are well suited 
for use in so-called "gene chip" diagnostic applications. Such 
applications have been developed by-i inter alia-i Synteni and 
Affymetrix- Brieflyn all or part of the DNA molecules of the 

25 invention can be used either as a probe to screen a polynucleotide 
array on a "gene chip-i" or they may be immobilized on the chip 
itself and used to identify other polynucleotides via 
hybridization to the surface of the chip. In this manner-i for 
example-i related genes can be identified-) or expression patterns 

30 of the gene in various tissues can be simultaneously studied- 
Such gene chips have particular application for diagnosis of 
disease-! or in forensic analysis to detect the presence or absence 
of an analyte- Suitable chip technology is described for example-i 
in lilodicka et al.-i Nature Biotechnology^ 15:135=1 (m?) which is 

35 hereby incorporated by reference in its entirety^ and references 
cited therein. 

PROTEIN- PROTEIN INTERACTIONS 
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Due to their similarity to certain known proteins-, it is 
anticipated that some of the inventive protein molecules will 
interact with another class of cellular proteins. This is 
particularly true of those molecule containing leucine zipper 
5 motifs- 

Any method suitable for detecting protein-protein 
interactions can be employed for identifying interacting targets. 
Among the traditional methods which can be employed are co- 
immunoprecipitation-i crosslinking and co-purification through 

10 gradients or chromatographic columns. Utilizing procedures such 
as these allows for the identification of CAP gene products- Once 
identified-! a GAP protein can be used-, in conjunction with 
standard techniques! to identify its corresponding pathway gene. 
For examplei at least a portion of the amino acid sequence of the 

15 pathway gene product can be ascertained using techniques, well 
known to those of skill in the art-* such as via the Edman 
degradation technique (seei e.g. -, Creighton-, nfl3-, PROTEINS: 
STRUCTURES AND MOLECULAR PRINCIPLES-, U-H. Freeman & Co.-, N. Y. -» 
PP-3M-MT). The amino acid sequence obtained can be used as a 

20 guide for the generation of oligonucleotide mixtures that can be 
used to screen for pathway gene sequences- Screening can be 
accomplished-, for example-, by standard hybridization or PCR 
techniques. Techniques for the generation of oligonucleotide 
mixtures and for screening are well-known. (See e.gr.-r Ausubeli 

25 supra-, and PCR PROTOCOLS: A GUIDE TO METHODS AND APPLICATIONS-, 
lTTD-i Innis et al.i eds- Academic Press-i Inc.-, New York). 

Additionally-, methods can be employed which result in the 
simultaneous identification of interacting target genes- One 
method which detects protein interactions in vivoi the two-hybrid 

30 system-, is described in detail for illustration purposes only and 
not by way of limitation. One version of this system has been 
described (Chien et al.i Proc. Natl. Acad. Sci. USA-, flfi: c JS?a- c ISaa 
(1^1)) and is commercially available from Clontech (Palo Alto-, 
CA). 

35 Briefly-, utilizing such a systemi plasmids are constructed 

that encode two hybrid proteins: one consists of the DNA-binding 
domain of a transcription activator protein fused to a known 
protein-, in this case an inventive protein-, and the other contains 
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the activator ' protein's activation domain fused to an unknown 
protein (a putative GAPi for instance) that is encoded by a cDNA 
which has been recombined into this plasmid as part of a cDNA 
library- The plasmids are transformed into a strain of the yeast 
5 Saccharomyces cerevisiae that contains a reporter gene (e.gr. , 
lacZ) whose regulatory region contains the transcription 
activator's binding sites- Either hybrid protein alone cannot 
activate transcription of the reporter gene-i the DNA-binding 
domain hybrid cannot because it does not provide activation 

10 functionn and the activation domain hybrid cannot because it 
cannot localize to the activator's binding sites- Interaction of 
the two hybrid proteins reconstitutes the functional activator 
protein and results in expression of the reporter gene! which is 
detected by an assay for the reporter gene product - 

15 The two-hybrid system or related methodology can be used to 

screen activation domain libraries for proteins that interact with 
a known "bait" gene product- By way of example! and not by way of 
limitation-i gene products known to be involved in TH cell 
subpopul at ion-related disorders and/or differentiation! 

20 maintenance! and/or effector function of the subpopulations can be 
used as the bait gene products- Total genomic or cDNA sequences 
are fused to the DNA encoding on activation domain- This library 
and a plasmid encoding a hybrid of the bait gene product fused to 
the DNA-binding domain are cotransf ormed into a yeast reporter 

25 strain! and the resulting transf ormants are screened for those 
that express the reporter gene- For example! and not by way of 
limitation! the bait gene can be cloned into a vector such that it 
is translationally fused to the DNA encoding the DNA-binding 
domain of the CALM protein- These colonies are purified and the 

30 library plasmids responsible for reporter gene expression are 
isolated- DNA sequencing is then used to identify the proteins 
encoded by the library plasmids- 

The present invention! thus generally described! will be 

35 understood more readily by reference to the following examples! 

which are provided by way of illustration and are not intended to 
be limiting of the present invention- 
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The examples below are provided to illustrate the subject 
invention- These examples are provided by way of illustration and 
are not included for the purpose of limiting the invention* 

EXAMPLES 



5 EXAMPLE I: cDNA Library Construction 



cDNA library plates and clones originated from five cDNA 
libraries that were constructed by directional cloning- These are 
available through the Resource Center (http://www-rzpd.de) of the 

10 German Genome Project- In particular-! the hfbrS (human fetal 
brains RZPD number DKFZpSbM) and hfkdE (human fetal kidneys 
DKFZpSbb) libraries were generated using the Smart kit 
(Clontech)-i except that PCR was carried out with primers that 
contained uracil residues to permit directional cloning without 

15 restriction digestion and ligation-i and were complementary with 
the pAflPl (Lif eTechnologies) cloning sites for directional 
cloning. The htes3 (human testes; DKFZp43M)-i hutel (human uterus; 
DKFZpSflb) and hmcfl (human mammary carcinoma; DKFZp727) libraries 
are conventional (Gubleri U.-i Hoffman-i B.J.-. (nfl3)-i A simple and 

20 very efficient method for generating cDNA libraries. Gene 25-. 
2^3-2^) i size-selected cDNA libraries- They are cloned into 
pSPORTl (LifeTechnologies) via a NotI site which is introduced 
during reverse transcription downstream of the oligo dT primer 
and a Sail site that is introduced by the ligation of a adapters- 

25 The human mammary carcinoma library was constructed from I1CF7 
cells- 

In a similar fashion*! the hamy2 (human amygdala nucleus 
(inside the brain); RZPD number DKFZp7bl) and hmel2 (human 
melanoma; RZPD number DKFZp7t2) libraries have been generated 

30 using conventional approaches-i emplying a NotI -dT V primer for 
first strand synthesis (GAGCGGCCGC(T)nV) . After second strand 
synthesisn Sail adapters were ligated to the blunted cDNA- Then 
the cDNA was cut with NotI to generate Sall-NotI compatible ends 
at .the 5 1 and 3 n ends of the cDNAi respectively i to allow 

35 directional cloning- The cDNAs were then size selected on agarose 
gels in two dimensions and cloned into the pSPORTl plasmid vector 
which had been pre-cut with Sail and NotI (LifeTechnologies). The 
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DNA was transformed into the DH10B bacterial strain and single 
colonies were picked into 3fl4well microtiter plates from the non- 
amplified library. The human melanoma library was constructed 
from MeWo cellsi published by Kern-. N-A.-> Helmbachi H-i Artuc-i 
M.t Karmann-i D--i Jurgovskyi K . and Schadendorf i D - (IW) Human 
melanoma cell lines selected in vitro displaying various levels 
of drug resistance against cisplatin-i f otemustine-i vindesine or 
etoposide: modulation of proto-oncogene expression- Anticancer 
Res. 17-, 435T-4370- 

The cDNA sequences of this application were first identified 
among the sequences comprising various libraries. Technology has 
advanced considerably since the first cDNA libraries were made. 
Many small variations in both chemicals and machinery have been 
instituted over timei and these have improved both the efficiency 
and safety of the process. Although the cDNAs could be obtained 
using an older procedure-i the procedure presented in this 
application is exemplary of one currently being used by persons 
skilled in the art- For the purpose of providing an exemplary 
methodi the mRNA isolation and cDNA library construction 
described here is for the MCF-7 library (DKFZp727) from which the 
clones named DKFZphmcf l_xxyyxx were obtained. 

The human cell line HCF-7 was grown in DMEM supplemented 
with ID* fetal calf serum until confluency. 3 X ID 6 cells were 
harvested with a cell scraper in PBS- Cells were lysed in buffer 
containing 0-5 X NP-40 to leave the nuclei intact- The debris was 
pelleted by centrif ugation at IS ODD x g for 10 minutes at 4 
degrees Celsius. Proteins in the supernatant were degraded in 
presence of SDS and Proteinase K (30 minutes at 5b degrees 
Celsius)- Precipitation of proteins was done in a 
Phenol/Chloroform extraction! RNA was precipitated from the 
aqueous phase with Na-acetate and Ethanol- Polyadenylated 
messages were isolated using tfiagen Oligotex (flIACEN-i Hilden 
Germany) - 

First strand cDNA synthesis was accomplished using an oligo 
(dT) primer which also contained an NotI restriction site- Second 
strand synthesis was performed using a combination of DNA 
polymerase I-. E. coli ligase and RNase H-i followed by the 



-597- 



WO 01/98454 PCT/IB01/02050 
addition of a Sail adaptor to the blunt ended cDNA - The Sail 
adaptedi double-stranded cDNA was then digested with NotI 
restriction enzyme-i and fractionated by size on an agarose gel- 
DNA of the appropriate size was cut from the gel and cast into a 
5 second gel in a TD° angle. After electrophoresis in the second 
dimension! cDNA of the appropriate size was cut from the gel- The 
agarose block was broken down with help of gelase- The cDNA was 
purified with help of two phenol extractions and an ethanol 
precipitation- The cDNA was ligated into Sall/NotI pre-digested 
10 pSportl vector (Lif eTechnologies) and transformed into DH10B 
bacteria. 

The libraries were arrayed into 3fl l 4-well microtiter plates 
and spotted on high density nylon membranes for hybridization 
analysis- All libraries have been arrayed into SflHwell 
15 microtiter plates and spotted on high density nylon membranes for 
hybridization analysis- 

The hamy2 Library consists of 121 3fiMwell plates comprising MbMbM 
clones- The hmelH library consists of 7H 3flMwell plates 
comprising 5?mfl clones- Filters and clones are available 
20 through the Resource Center of German Genome Project 

(http://www.RZPD.de). Uhole library plates were distributed to 
the sequencing partners of the consortium for systematic 
sequencing. 

25 EXAMPLE II: Sequencing of cDNA Clones 

All clones in the 3AH-well microtiter plates were sequenced 
from the 5 r end. Sequencing was done preferentially using dye 
terminator chemistry (ABD or Amersham) on ABI automated DNA 
sequencers (ABI 377t Applied Biosystems) -i one partner used EflBL 
30 prototype instruments (Arakis) mainly with dye primer chemistry. 

The resulting expressed sequence tag CEST) sequences ( u rl 
ESTs 11 = sequenced from S'-end) were analysed for: 

a) the lack of identical matches with known genes- 

For this-i the EST-sequence was blasted against the cDNA 
35 consortiums own database and after that against public databases 
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and (with BLASTn and BLASTx against EMBL/EMBLNEU and assembled 
ESTs-, please refer to EXAMPLE III: Bioinf ormatics analysis of 
full length cDNAs-, for description and parameter settings). ESTs 
which were identical to known genes in more than 1DD bp-i with 
5 less than E mismatches-! were excluded from further analysis- 

b) the presence of an open reading frame 

Open reading frames (ORFs) were detected with an tool 
developed by Munich Information Center for Protein Sequences 
(MIPS) called ORF-map- ORF-map. visualises potential start and 
10 stop-codons- If an ORF without a stop codon was detected in a 
rl-EST-» the sequence was processed further- 

c) the presence of GC rich sequences 

A script developed by MIPS computed the GOcontent of the 
rl-sequencei which should be >MDX- Writing similar scripts is 
15 within the ordinary skill of one in bioinf ormatics • 

d) the lack of repeat structures 

Repeats such as Alu-i Line or CA-repeats were detected by 
blasting (BLASTn and BLASTx-, please refer to EXAMPLE III: 
Bioinf ormatics analysis of full length cDNAs-, for description and 
20 parameter settings) against a repeat-database compiled by MIPS- 
If a repeat was present within the rl-sequence-, the sequence were 
not processed further. 

Novel clones that met all criteria were identified to the 
sequencers! who then performed B^-end sequencing of these clones- 
25 The resulting 3 n ESTs ( tl sl ESTs" = sequenced from 3'-end) were 
checked for 

a) the lack of matches with known genes in public databases-! 
and sequences already generated by us- 

This was done by blasting against EMBL/EMBLNEU and assembled 
30 EST (BLASTn and BLASTx-, please refer to EXAMPLE III: 

Bioinf ormatics analysis of full length cDNAs-, for description and 
parameter settings) - 



-599- 



WO 01/98454 PCT/1B01/02050 
b) the presence of polyadenylation signals- 



Again only clones matching the selection criteria were 
chosen to be sequenced completely by the sequencers- Clones were 
selected after the following criteria: 

5 A very good ORF had at least one BLASTx match to other 

proteins- A "good 0RF n should extend to the 3 r end and be longer 
than ~MQ codons- If the ORF started in the rl sequencen in front 
of the potential start codon-* there should not exist too many 
competing start codons in frame with the ORF start codon and the 

10 start should match the Kozak consensus ATG • If the EST sequence 
was to short to decide according to the potential ORF-i and there 
were only a few or no start codons in the sequence the GC content 
of the Sequence should be greater than MO*. The rl sequences 
needed not contain an polyA-tail at the 3 r end- In addition-i the 
, 15 results of the blasting against the assembled human ESTs could 
help in questionable cases to decide whether to stop or to 
continue- A hit against these ESTs was an indication to go 
further- 
Clones passing the above-described screening were sequenced 

20 in full. Sequencing was done preferentially using dye terminator 
chemistry (ABD or Amersham) on ABI automated DNA sequencers (ABI 
377-1 Applied Biosystems)i one partner used EHBL prototype 
instruments (Arakis) mainly with dye primer chemistry. Primer 
walking (Strauss et al-i ITflbi Specific-primer-directed DNA 

25 sequencing. Anal Biochem- ISM-i 353-3bQ) was the preferred 

sequencing strategy because of the lower redundancy possible 
compared to random shotgun (Hessingi J-i Creai R-n Seeburg-i H-P- 
(nfll) A system for shotgun DNA sequencing- Nucleic Acids Res- T-i 
32-31) methods- Walking primers were generally designed using 

30 software (e-g- Haas-, S-i Vingron-t II. i Poustka-i A.-, liliemann-i S- 
(WS) Primer design in large-scale sequencing. Nucleic Acids 
Res. Hbi 30ab-3DlET Schwageri d Uiemann-i S.-i Ansorgen Id. (mS) 
GeneSkipper: integrated software environment for DNA sequence 
assembly and alignment. HUGO Genome Digest 2i fl-T) that permitted 

35 complete automation of this usually time consuming process and 
helped in the parallel processing of large numbers of clones. 
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EXAMPLE III: Bioinf ormatics analysis of full length cDNAs 



Each sequence obtained was compared on nucleotide level in a 
stepwise manner to sequences in EMBL/EMBLNEIzK EMBL-EST-i EMBL-STS 
using the BLASTn algorithm- Basic Local Alignment Search Tool 
5 (BLAST-, Altschul S- F • (1^3) J Hoi Evol 3b : 510-3DCU Altschul-, S. 
F . et al (1110) J Mol Biol ai5:MQ3-lD) is used to search for 
local sequence alignments. BLAST produces alignments of both 
nucleotide (BLASTn) and amino acid sequences (BLASTp or BLASTx) 
to determine sequence similarity. BLAST is especially useful in 
10 determining exact matches or in identifying homologs-i because of 
the local nature of the alignments. Uhile it is useful for 
matches which do not contain gaps** it is inappropriate for 
performing motif-style searching. The fundamental unit of BLAST 
algorithm output is the High-scoring Segment Pair (HSP). 

15 An HSP consists of two sequence fragments of arbitrary but 

equal lengths whose alignment is locally maximal and for which 
the alignment BLAST approach is to look threshold or cut off 
score set by the user- BLAST looks for HSPs between a query 
sequence and a database sequencei to evaluate the statistical 

20 significance of any matches foundi and to report only those 
matches which satisfy the user-selected threshold of 
significance. The parameter E establishes the statistically 
significant threshold for reporting database sequence matches- E 
is interpreted as the upper bound of the expected frequency of 

25 chance occurrence of an HSP (or set of HSPs) within the context 
of the entire database search- Any database sequence whose match 
satisfies E is reported in the program output- Parameter settings 
for the BLAST-operations (BLASTN 5-DanMP-lilashU) described were: 
EMBL-EMBLNEIi): H=0 V=S B=S -filter seg^ EMBL-EST : H=D E=le-10 

30 B=5D0 V=SDD -filter seg^ EMBL-STS: H=D V=S B=S- 

Search against EMBL/EMBLNEtil was done to determine whether 
the cDNAs are already known-i and also to find out whether the 
cDNAs are encoded by genomic sequences already sequenced and 
published/submitted to these databases. 

35 Search against EMBL-EST was performed to get a first 

impression how abundant a particular cDNA would be and to get 
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information on tissue specificity (so-called "electronic 
Northern-Blot"-i e.g. some of the cDNAs derived of the testis 
library show only hits to ESTs also derived of testis libraries). 

The cDNA-sequences were blasted against ENBL-STS to 
5 determine STS-sequence-match to the cDNAi thus providing a 
mapping information to the new cDNA- 

The potential protein-sequences were generated automatically 
by a script searching for the longest open reading frame (ORF) in 
each of the three forward frames with a minimum length of 

10 codons. Next-, the automatically generated ORFs were translated 
into protein sequences. These protein sequences were searched 
against the non redundant protein data set of 
PIR/SwissProt/Trembel/Tremblnew (BLASTP 2.0anHP-UashU-i parameter 
setting: V=7 B=? H=D -filter seg). If the script generated more 

15 than one ORFi one ORF was chosen manually by the annotater 
according to the degree of similarity to known proteins-r the 
location of the ORF in the cDNAi the length-i the amino acid 
composition and the content of Prosite-Motif s . 

Additionally there was a BLASTx (BLASTX 5 .OanflP-lilashU 
20 against non redundant protein database comprising 
PIR/SUISSPROT/TREtlBL/TREMBLNEbU parameter-settings were: 

matrix/home/data/blast/matrix/aa/BL0SUt1b2 H=D V=S B=S -filter 
seg) search to find potential frame shift in the complementary 
cds of the cDNAs and to identify unspliced or partly spliced 
25 cDNAs - The protein sequence was then transferred to the PEDANT 
systemn in order to generate additional information on the new 
proteins. PEDANT (Protein Extraction-, Description! and ANalysis 
Tooli Frishman-i D- & Mewes-. H.-b). PEDANTic genome 

analysis. Trends in Genetics i 13i ms-41b) is a platform 
30 developed at the Munich Information Center for Protein Sequences 
(MIPS-, Munichi Germany)! which incorporates practically all 
bioinf ormatics methods important for the functional and 
structural characterisation of protein sequences- Computational 
methods used by PEDANT are: 
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Very sensitive protein sequence database searches with 
estimates of statistical significance- Pearson IJ-R- (IITD) Rapid 
and sensitive sequence comparison with FASTP and FASTA • Methods 
5 Enzymol- 133-1 b3-^fl- 

BLASTS 

Very sensitive protein sequence database searches with 
estimates of statistical significance- Altschul S-F-n Gish U-t 
Miller Id--. Myers E-U-n and Lipman D-J- Basic local alignment 
10 search tool- Journal of Molecular Biology 21Si mJ3-ld. 

PREDATOR 

High-accuracy secondary structure prediction from single and 
multiple sequences. Frishman-i D - and Argosi P- 752 
accuracy in protein secondary structure prediction- Proteins-i 27-> 
15 32=1-335. Frishman-i D- and Argos-i P-(mb) Incorporation of long- 
distance interactions in a secondary structure prediction 
algorithm. Prot- Eng. T-i 133-m2- 

STRIDE 

Secondary structure assignment from atomic coordinates. 
20 Frishmani D . and Argos-i P- (1*115) Knowledge-based secondary 
structure assignment. Proteins 23n Sbb-S?^- 

CLUSTALU 

Multiple sequence alignment- Thompson-* J-D-n Higgins-i D-G- 
and Gibson^ T-J. (mM) CLUSTAL L): improving the sensitivity of 
25 progressive multiple sequence alignment through sequence 

weighting-i positions-specific gap penalties and weight matrix 
choice. Nucleic Acids Researchi 22 : Mb73-Hha0- 

TMAP 

Transmembrane region prediction from multiply aligned 
30 sequences. Perssoni B- and Argos-i P- (ITTM) Prediction of 

transmembrane segments in proteins utilising multiple sequence 
alignments. J. Mol- Biol. B37-. 182-112- 
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TransmGmbranG region prediction from single sequences. 
Klein-i P., Kanehisa-i Pl.-i and DeLisi, C- Prediction of protein 
function from sequence properties: A discriminant analysis of a 
5 database. Biochim- Biophys. Acta 7A7i 221-22t (nSM). Version 2 
by Dr. <. Nakai- 

SIGNALP 

Signal peptide prediction Nielsen, H--, Engelbrecht, J-n 
Brunak-i S-i and von Heijne-i G IIW). Identification of 
10 prokaryotic and eukaryotic signal peptides and prediction of 
their cleavage sites- Protein Engineering 10i 1-b- 

SEG 

Detection of low complexity regions in protein sequences. 
Uootton-i J.C.i Federhen-i S. (1^3) Statistics of local complexity 
15 in amino acid sequences and sequence databases- Computers & 
Chemistry 17-, m^-lbB- 

COILS 

Detection of coiled coils. Lupas-i A-t H- Van Dyke, and J. 
Stock, "Predicting Coiled Coils from Protein Sequences-" Science 
20 252, llbE-lim. 

PROSEARCH 

Detection of PROSITE protein sequence patterns- Kolakowski 
L-F. Jr., Leunissen J-A-M-i Smith J-E- (1^2) ProSearch: fast 
searching of protein sequences with regular expression patterns 
25 related to protein structure and function- Biotechniques 13, HT- 

BLIMPS 

Similarity searches against a database of ungapped blocks- 
J-C- Wallace and Henikoff S., (ma) PATMAT ' a searching and 
30 extraction program for sequence! pattern and block queries and 
databases, CABIOS fl, 2^-251- Written by Bill Alford. 
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Hidden Markov model software • Sonnhammer E-L-L-n Eddy 
S.R-i Durbin R • Pfam: A Comprehensive Database of Protein 

Families Based on Seed Alignments. Proteins Efin M05-ME0- 

5 pi 

Perl script that returns the amino acid composition! molecular 
weight-i theoretical pli and expected extinction coefficient of an 
amino acid sequence. By Fred Lindberg. The parameter-settings 
were as follows: known3d: score > IDQi BLAST: E-value < IDt SCOP : 

10 <= SO Alignments-i E-Value < 0.0001; signalp: Y=Q.?n untersucht 

vom N-Terminus her: SD aa'i funcat: E-value < O.ODH BLOCKS: <= ID 
hitsi BLIMPS : threshold IIQD-Dn COILS : threshold D.T5i SE6: 
threshold ED-Dn BLAST in report: E-value < D.DOli PIR-KliN 
superf amiliesi EC-Nummern in report: E-value < □•□□ODIt known3d 

15 in report: score > 150 

The results of PEDANT analysis together with the results of 
the similarity searches constitute the basis for the structural 
and functional annotation of the cDNAs and the encoded proteins-i 
as specified herein. 
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tile claim : 

1- An assemblage! comprising at least one nucleic acid 
molecule having the sequence of a clone selected from the group 
consisting of: amyS_12g7n amy2_J2il=» amy2_JIi3g]i c ln amy2_lLiem t 
5 amyE_5MklSi amy2_2al3 # ! amy2_2il7; fbr2_7fldlfi! fbr2_7flelfl; 
amy2_121m2! amy2_2HbMn amy2_121f tes3_lLjb5^ amy2_li2m 
amy2_ljn* amy2_5bn=i amy2_7j5=. amy2_14b5! amy2_2ol3=i fkd2_3kln 
mel2_7gl4! mel2_12jl i melE_7kM* amy2_2c22=» f br2_7fli21 =. 
amy2_llnm amy2_lcl2=i amy2_liH amy2_2f22=. amy2_2gl2=» fbr2_7flcl2=. 

10 tes3_10ilbT tes3_31alDn amy2_10hl7; amy2_10p?; amy2_12d7i 
amy2_2flfl=i tes3_llc22'! tes3_lld21n tes3_2^f 2^ tes3_31 j20=i 
tes3_Sk22i Tes3_10nlD! Tes3_Hel7*i Tes3_12dlfi h Tes3_mi7! 
Tes3_15nmn Tes3_lbp3; Tes3_npl2; Tes3_21km ; Tes3_22ill; 
Tes3_2212MV tes3_2bg3; tes3_30pb! amy2_lld2T amy2_121ol7i 

15 amy2_lim; amy2_24c3! fbr2_7fldH=. tes3_llal?; tes3_17i21=. 

tes3_2Dhl2! tes3_7nl2=i tes3_ c ielb! amyBJiMmlb* tes3_lflnl4! their 
complements^ and variants thereof. 



2* An assemblage! comprising at least one nucleic acid 
20 molecule having the sequence of a clone selected from the group 
consisting of: amy2__12g7n amy2_12iH amy2_13gn! amyS_lbel4i 
amy2_2 l 4klSi amy2_2al3i amy2_2il7n amy2_121m2i amy2_2Mb4=i 
amy2_121f 1^ amy2_li2H ^ amy2_l jlH amy2_2bni amy2_7jSn 
amy2_lMb5! amy2_2ol3=i amy2_2c22n amy2_lln l 4=; amy2_lcl2n amy2_lil! 
25 amy2_2f22i amy2_2gl2n amy2_lDhl7; amy2_l[]p7; amy2_12d7; 

amy2_2flfli amy2_lld2; amy2_121ol7; amy2_lim; amy2_24c8i their 
complements! and variants thereof - 

3- An assemblage! comprising at least one nucleic acid 
molecule having the sequence of a clone selected from the group 
30 consisting of: f br2_7fidlfl=, fbr2_7flelfl! fbr2_7fli2l! fbr2_7ficl2; 
fbr2_7Sd l 4! their complements! and variants thereof. 



An assemblage! comprising at least one nucleic acid 
molecule having the sequence of a clone selected from the group 
35 consisting of: amy2_121m2! amy2_2MbH! their complements; and 
variants thereof . 



-606- 



WO 01/98454 PCT/IB01/02050 
5- An assemblage! comprising at least one nucleic acid 
molecule having the sequence of a clone selected from the group 
consisting of: amy2_121fn=; tes3_lbb5; their complements^ and 
variants thereof- 

5 b. An assemblage! comprising at least one nucleic acid 

molecule having the sequence of a clone selected from the group 
consisting of: amy2_li24! amy3_l jlT; amy2_2bl c i; amy2_?j5=i their 
complements^ and variants thereof. 

7. An assemblage! comprising at least one nucleic acid 
10 molecule having the sequence of a clone selected from the group 
consisting of: amy2_lMbS^ amy2_2ol3! fkd2_3kl! mel2_7gm! their 
complements^ and variants thereof. 

fl. . An assemblage! comprising at least one nucleic acid 
molecule having the sequence of a clone selected from the group 
15 consisting of mel2_7gl4; mel2_12jl ! mel2.J?kM; their 
complements! and variants thereof. 

T- An assemblage! comprising at least one nucleic acid 
molecule having the sequence of a clone selected from the group 
consisting of: amy2_2c22! fbr2_7fli2l! their complements! and 
20 variants thereof. 

ID. An assemblage! comprising at least one nucleic acid 
molecule having the sequence of a clone selected from the group 
consisting of: amy2_lln4! amy2_liH amy2_2gl2! fbr2_?acl2i 
tes3_10ilb! tes3_31al0! their complements! and variants thereof. 

25 11. An assemblage! comprising at least one nucleic acid 

molecule having the sequence of a clone selected from the group 
consisting of: amyS_10hl7! amy2_lDp7 ! amy2_12d?! amy2_2flfl! 
tes3_llc22! tes3_lld2l! tes3_2Tf24! tes3_31j2D! tes3_£k22! their 
complements! and variants thereof. 

30 12- An assemblage! comprising at least one nucleic acid 

molecule having the sequence of a clone selected from the group 
consisting of: tes3_lfc>b5! tes3_lDilb; tes3_31al0! tes3_llc22! 
tes3_lld2l; tes3_21f24; tes3_31j20; tes3_5k22; Tes3_10nl0; 
Tes3_llel?! Tes3_12dlfi ; Tes3_im?! Tes3_lSnl4! Tes3_lbp3! 
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TesB^pl^ Tes3_21klMn Tes3_22ill=i Tes3_2212M=> tes3_2bg3?, 
tes3_30pbn tes3_llal7^ tes3_17i2H tes3_20hl2; tes3_7nl2^ 
tesS^TeltiT their complements^ and variants thereof. 

5 13- An assemblagen comprising at least one nucleic acid 

molecule having the sequence of a clone selected from the group 
consisting of: amy2_lld2i amy2_121ol?; amy2_Jiil4; amy2_BMcA : i 
fbr2_7Adm tes3_llal?=, tes3_17i21=. tes3_BQhl2; tes3_7nl2i 
tes3_1elb; their complements; and variants thereof. 

10 1M- An assemblagen comprising at least one nucleic acid 

molecule having the sequence of a clone selected from the group 
consisting of: amy2_mmlb; tes3_lAnlM=; amy2_lcl2; amy2_2f22; 
their complements; and variants thereof. 

IS. A nucleic acid molecule comprising a nucleotide 
15 sequence of the clone fkd2_3kl. 

lb- A computer readable medium-i comprising in electronic 
form at least one nucleic acid or protein sequence of a clone 
selected from the group consisting of: amy2_12g7; amy2_JL2i]ii 
amy2_13gn; amy2_lbemi amy2_2 l 4kl5^ amy2_2al3; amy2_2il7; 

20 fbr2_7AdlA; fbr2_7AelA; amy2_121m2; amy2_2Mbin amy2_121f 

tes3_lt,bS=i amy2_li2 l n amyB_l jlT; amyEJblH amy2_7jS; amy2_mb5; 
amy2_2ol3=. fkd2_3kl; melB_7gim mel2_12jl ; mel2_7kn; amy2_2c22; 
fbrS_7Ai2lS amy2_llnm amy2_lcl2; amy2_lil; amy2_2f22; amy2_2gI2; 
fbr2_7Acl2; tes3_lDilb; tes3_31al0=; amy2_10hl7; amy2_10p7; 

25 amy2_12d7; amy2_2flA; tes3_llc22=. tes3_lld21=; tes3_21f2m 
tes3_31j2D; tes3_5k22; Tes3_lDnlO^ Tes3_llel7; Tes3_12dlA * 
Tes3_mi7; Tes3_15nmn Tes3_lbp3; Tes3_npl2; Tes3_21kim 
Tes3_22illn Tes3_2212M; tes3_2bg3; tes3_30pb; amy2_lld2; 
amy2_121ol7T amy2_lil4; amy2_24cA=; fbr2_7AdM; tes3_llal7=> 

30 tes3_17i2H tes3_20hl2; tes3_7nl2; tes3_^elb; amy2_mmlb; 
tes3_lAnl4n their complements; and variants thereof. 



17- A computer readable medium-i comprising in electronic 
form at least one nucleic acid or protein sequence of a clone 
35 selected from the group consisting of: amy2_12g7; amy2_12il; 
amy2_13gn=; amy2_lbem; amy2_24kl5; amy2_2al3=i amy2_2il7; 
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amyS_121mE=i amy2_2Mb4; amy2_121f n =. amy2_li24; amyS.J.jn* 
amy2_2bn; amy2_ 7j5; amy2_14b5; amy2_2ol3; amy2_2c22; amyE^llnH; 
amy2_lcl2; amy2_lil; amy2_2f22; amy2_2gl2; amy2_JLDhl7; amy2_10p7; 
amy2_12d7; amy2_2flfi; amy2_lld2; arny2_121ol7 : ; amy2_lim; 
5 amy2_24cfi; their complements; and variants thereof- 

Ifl- A computer readable mediumi comprising in electronic 
form at least one nucleic acid or protein sequence of a clone 
selected from the group consisting of: fbr2_7fldlfl; fbr2_7flelfl; 
fbr2_7fii2l; f br2_7Bcl2; fbr2_7fid4; their complements; and 
10 variants thereof • 

M- A computer readable mediunn comprising in electronic 
form at least one nucleic acid or protein sequence of a clone 
selected from the group consisting of: amy2_121m2; amy2_2Mb4; 
their complements; and variants thereof. 

15 20. A computer readable mediums comprising in electronic 

form at least one nucleic acid or protein sequence of a clone 
selected from the group consisting of: amy2_121f IT; tes3_lbb5; 
their complements; and variants thereof. 

21- A computer readable mediumi comprising in electronic 
20 form at least one nucleic acid or protein sequence of a clone 

selected from the group consisting of: amy2_li2 l +; aray2_l jn; 
amy2_2blT; amy2_7j5; their complements; and variants thereof. 

22- A computer readable mediunn comprising in electronic 
form at least one nucleic acid or protein sequence of a clone 

25 selected from the group consisting of: amy2_14b5; amy2_2ol3; 
fkd2_3kl; mel2_7gl4; their complements; and variants thereof. 

23- A computer readable medium-i comprising in electronic 
form at least one nucleic acid or protein sequence of a clone 
selected from the group consisting of: mel2_12jl ; mel2_7kn; 

30 their complements; and variants thereof - 

2M- A computer readable mediumi comprising in electronic 
form at least one nucleic acid or protein sequence of a clone 
selected from the group consisting of: amy2_2c22; fbr2_7fii2i; 
their complements; and variants thereof. 
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2S- A computer readable medium-i comprising in electronic 
form at least one nucleic acid or protein sequence of a clone 
selected from the group consisting of: amy2_llnm amy2_lili 
amy2_2gl2 : i fbr2_?acl2n tes3_10ilfcj i tes3_31alDn their complements; 
5 and variants thereof - 

EL- A computer readable medium^ comprising in electronic 
form at least one nucleic acid or protein sequence of a clone 
selected from the group consisting of: amy2_10hl7; amy2_10p7; 
amy2_12d7; amy2_2flfli tes3_llc22; tes3_lld2H tes3_2Tf24i 
10 tes3_31j2Dn tes3_5k22; their complements; and variants thereof. 

27. A computer readable mediumn comprising in electronic 
form at least one nucleic acid or protein sequence of a clone 
selected from the group consisting of: tes3_ltib5; tes3_lDilb=; 
tes3_31al0=. tes3_llc22; tes3_Hd21; tes3_2Tf 24; tes3_31j2D; 
15 tes3_5k22; Tes3_10n3iD; Tes3_llel7; Tes3_12dlfi ; Tes3_1417; 
Tes3_15nl4; Tes3_lk»p3; Tes3_l c Ipl2i Tes3_21kl4 ; Tes3_22ill; 
Tes3_22124; tes3_2bg3; tes3_30pb; tes3_llal7; tes3_17i2l; 
tes3_20hl2; tes3_7nl2; tesS.Jlelb; their complements; and variants 
thereof • 

20 

2fl. A computer readable mediumi comprising in electronic 
form at least one nucleic acid or protein sequence of a clone 
selected from the group consisting of: amy2_lld2; amy2_121ol7; 
amy2_lil4; amy2_24cfl; fbr2_7fid4; tes3_llal7; tes3_17i2l; 
25 tes3_20hl2; tes3_7nl2; tes3_1elbn their complements; and variants 
thereof. 

2T- A computer readable mediumi comprising in electronic 
form at least one nucleic acid or protein sequence of a clone 
selected from the group consisting of: amy2_14mlti; tes3_ lflnl4; 
30 amy2_lcl2; amy2_2f22; their complements; and variants thereof. 

30. A computer readable medium^ comprising in electronic 
form a nucleic acid or protein sequence of the clone fkd2_3kl. 
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